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Every month, 23 million p people use CareerBuilder.com to find their next job. 
Dell™ PowerEdge” ser ers and Microsoft® SQL Server® helped CareerBuilder 


Nith Dell, you can make the most of Microsoft SQL Server. We help speed 
deployme nent, simplify management, and reduce costs. That’s why we’ve sold 
A SQL Server than anyone else, anywhere in the world. 
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*Individual results may vary. 


Dell is a trademark of Dell Inc. ©2008 Dell Inc. All rights reserved. 


IE Se 


emaancanrno 
TECHNOLOGIES 


+ 


+ r A 
WOOHOO 100 
0190109100001 


g v4 
DOO 


e hour to rock your world 


l grade, cross-platform database 
QL® and Embarcadero® Change 


less than 60 minutes. 


Embarcadero products let you access your databases from a single 
application window — so you can manage multiple databases, across 


different DBMS platforms. No other products or native tools let you do 


that. So, why waste your time on anything else? 


Try it for FREE and see for yourself! We are so sure that you can 


achieve real benefits in real time that we're offering you the chance to 


try all of Embarcadero’s products for FREE in a 14 day trial. Just visit 
www.embarcadero.com/challenge to download your copy now! 


Embarcadero Technologies is throwing down 

the gauntlet to all database administrators, 

developers and data modelers. We have 

compiled a number of common database 

problems and each task has an estimated time for completion 
— but we challenge you to do them even faster! 


To get started, simply download your evaluation software and 
click on the ‘db FEST - Take the Challenge’ icon to check out the 
step-by-step guide. Then simply record the time taken to complete 
each task, return the form to us and you will automatically receive 
a FREE Embarcadero ‘db FEST tour’ t-shirt — and the chance to 
win an even bigger prize! 
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Push the envelope on report design by using BIDS properties to 
refresh and paginate onscreen reports, keep data regions together 
on a page, and produce multicolumn reports. 
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8 Editors Tip 


Are you coming to TechEd 
this month? Be sure to 
drop by and see us at the 
SQL Server Mag booth, and 
don't miss the sessions 
presented by our authors. 
—Sheila Molnar, 
senior editor 


21 Use T-SQL to Solve the 33 Sharpen Your Basic 


Classic Eight Queens Puzzle 
—ltzik Ben-Gan 

Hone both your logic and programming skills by using 
T-SQL to solve this logic puzzle. 


SQL Server Skills 

—Pinalkumar Dave 

Discover the difference between Windows Authentication 
Mode and Mixed Mode, and learn the names and uses of 


23 security Logging with OO installed system databases. 


Security Logging with 


Service Broker 35 Backup and Restore 


— Tom Carpenter 

Create message types, contracts, queues, and services 
for an event-logging solution, and use simple ASP 
code to call a SQL Server stored procedure that places 
information in a security logging queue for automated 


Fundamentals 


—Michelle A. Poolet 
Back up your SQL Server databases regularly to prevent 
lost data in the event of a disaster. 


—__ Processing. S 41 Who’s Hogging My Server? 


21 Data Warehousing: 
Degenerate Dimensions 
— Michelle A. Poolet 


—Andrew J. Kelly 
Obtain performance metrics by using SQL Server 2005's 
built-in sys.dm_exec_plan_attributes DMV. 


Find out how to map control numbers to a fact table 43 T-SQL 101, Lesson 4 


and associate them with each line item. 


29 Using Large CLR UDTs 
in SQL Server 2008 
—Tyler Chessman 
See how the VarBinaryComp UDT takes advantage of 
new support for large types to store, compress, and 
decompress up to 2GB of binary data. 


— William McEvoy 
Thanks to the GROUP BY clause, you can write 
SELECT queries that produce detailed reports. 
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Using the right instrument means better performance. ay our new technical brief: 
gees the Right Instrument” at 
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13 Tool Time: 
SQL Server Internals Viewer 
—Kevin Kline 


Use the four interfaces provided by this free tool to see how SQL Server stores data. 
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49 Market Watch: 
Solid State Storage for SQL Server 
—Lavon Peters 
SSDs have been used in the military and aerospace industries for years—but are 
they suitable for your enterprise? 


51 Product Review: 
SteelEye LifeKeeper Protection Suite for SQL Server 
—John Green 
Need high availability and clustering for SQL Server? If you don’t mind the lack 
of granular data rewind capabilities, check out SteelEye’s suite. 


52 Product Review: 
HP ProLiant DLI60 
—Michael Otey 
A powerful server from HP’s ProLiant line, this high-performance LU 2-socket, 
64-bit unit is perfect as a Web server—but if you're expecting support for Windows 
Server 2008 Hyper-V, the DL160 can’t help. 


53 Industry News: 
Bytes from the Blog 
Jeff James discusses the recent acquisition of MySQL AB by Sun Microsystems. 


54 New Products 
Check out new and improved SQL Server—related products from Fujitsu, NetPro, 
Confio, and Secerno. 
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Visual Stylesheet Designer 


Check out Altova StyleVision® 2008 — the award-winning graphical stylesheet design tool for transforming 
XML and database content into HTML pages, RTF documents, PDF reports, Word 2007 (OOXML), and 


Authentic® electronic forms. 
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e Visual, drag & drop stylesheet design 

e Publish XML and database data in HTML, RTF, PDF, 
Word 2007 (OOXML), & Authentic electronic forms 

e Create stylesheets for XML Schema, DTD, 
& database data sources 

e Graphically design XSLT 1.0 / 2.0 and XSL:FO 
stylesheets & preview results 


Based on a single stylesheet design, StyleVision 
generates output simultaneously in HTML, RTF, 
PDF, and Microsoft® Word 2007 (OOXML), as well 
as an electronic form for use with Altova Authentic®. 


With its easy-to-use drag and drop design interface 
and unique multiple output functionality, StyleVision 
is an invaluable tool for single source, multi-channel 
publishing. 


e Publishing XML and DB data in Word 2007 (OOXML) 
e Grouping based on XSLT 2.0 

e Global Templates based on Complex & Simple Types 
e Inline Primary / Foreign Key creation 

e Support for XSL:FO styles for page processing 

e Support for SQL SELECT statements 


e Project management support 


e And much more 


e Support for CSS & JavaScript 

e Advanced functions for dynamic presentation 

e Combine multiple sources in one design 

e Database Query window with SQL editor 

e Support for industry standard templates 
(DITA, DocBook, etc.) 

e Executing batch operations via command line 
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What Readers Want 
SQL Server to Include 


[| “Where SQL Server Should Go from 
n Here,” February 2008, InstantDoc ID 
97954, I wrote about some of the features that I 
think should be included in the next SQL Server 
release and asked our readers for their feedback. 
Although it might seem to be too early to discuss 
future SQL Server versions because SQL Server 
2008 isn’t even scheduled to be released until third 
quarter 2008, Microsoft is already busy planning 
which features will be included in the next SQL 
Server release. Some of the features that I would 
like to see in a future SQL Server release include 
the ability to run SQL Server on Windows Server 
2008 Server Core, support for Language Integrat- 
ed Query (LINQ) in SQLCLR, and a development 
front end for SQL Server Service Broker. Here are 
some of the features that readers believe should be 
included in the next version of SQL Server. 

Several readers shared my frustrations with 
Service Broker not only because it lacks an inte- 
grated design tool, but also because of its restricted 
example sets that imply there are limitations to the 
product that aren’t actually there. One reader said 
“T think Microsoft may be doing themselves a dis- 
service with Service Broker currently. It took awhile 
(two different SQL Server 2005 manuals and the 
Service Broker beta preview manual) but I figured 
out that you don’t have to use explicit XML in 
broker messages; just plain text can be used.” This 
reader’s comment reflects the fact that it’s difficult 
to get started using Service Broker, and its capa- 
bilities aren’t clearly documented and discoverable. 
However, an interactive development tool would 
make it easier to get started with Service Broker 
and see its underlying capabilities. 

Another feature that readers requested is an auto- 
mated forms building subsystem that’s modeled after 
SQL Server Reporting Services (SSRS). One reader 
said, “I think that SQL Server needs to come with a 
data entry solution. Microsoft has revolutionized the 
way that we do reporting. I think that it might be safe 
to say that SSRS is the second most important thing 
to come out of Redmond this past decade. I work 
for a company that builds SQL Server solutions for 
large retailers. We spend about 1/10th of our time 
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doing reports and 90 percent of our time building 
dumb Web-based data entry forms. If Microsoft 
were to come out with ‘SQL Server Forms Services, 
it could make our offerings a lot more stable, predict- 
able, and logical. Make it look like SSRS and give 
us 100 different ways to make a data entry form.” 
This reader’s idea makes a lot of sense to me. An au- 
tomated forms building tool would not only be use- 
ful for organizations of all sizes, but also could be 
bundled with SQL Server Express to compete with 
Oracle Database Lite, which includes a similar tool. 
Typically, the first thing many businesses want to do 
with a new database is automatically generate Web- 
based forms. 

I received a variety of small requests from 
readers as well. A reader captured the essence of 
these requests in the following message: “What 
would be nice: a transaction log reader, the ability 
to compare SQL Objects a bit easier (they had a 
command-line compare tool in SQL Server 2005, 
maybe this has been enhanced in SQL Server 
2008), and an improved SQL Server Management 
Objects (SMO) (although programming database 
transfers in SMO is much better than SQL Server 
Integration Services—SSIS—in 2005, it still dies 
on databases with lots of objects).” I agree; a util- 
ity to view and navigate the transaction log would 
useful. Although the ability to compare databases 
exists in Visual Studio Team System for Database 
Professionals, many organizations would prefer for 
it to be included with the built-in SQL Server man- 
agement tools. SQL Server has come a long way 
in dealing with large databases; however, a better 
performing management framework would put it 
in a more favorable position when compared with 
Oracle or DB2. 

Don’t get me wrong, Microsoft’s SQL Server 
team has done a great job of covering the bases 
with SQL Server, and SQL Server’s features com- 
pare very favorably to Oracle and DB2’s features. 
However, there’s no doubt that as SQL Server 
moves forward, it will need to include more useful 
and practical features. Thanks to the readers that 
contributed their thoughts on this subject. SQL] 

InstantDoc ID 98866, 


Michael Otey 


mikeo @ windowsitpro.com) is technical 
director for Windows IT Pro and SQL Server 


Magazine and coauthor of SQL Server 2005 
Developers Guide (Osborne/McGraw-Hill). 
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Q Editor’s Tip 


Share your SQL Server 
code, comments, discov- 
eries, and solutions to 
problems. Email your 
contributions to_r2r@ 


sqlmag.com. Please include 
your full name and phone 


number. We edit submis- 
sions for style, grammar, 
and length. If we print 
your submission, you'll 
get $100. 
—Karen Bemowski, 
senior editor 
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Adventures with Big Data: How to Import 
16 Billion Rows into a Single Table 


S here I was, writing some SQL Server 2005 
O Reporting Services (SSRS) reports for a user 
when a coworker asked if I could process some data for 
him. I’m the SQL Server guy at work, so I immediately 
volunteered to help. 

The data arrived several days later on a 1TB 
external hard drive that contained 20,000 flat files 
taking up 400GB of space. The data consisted of 
latitude and longitude coordinates and three extra 
informational fields. I had no idea how many records 
the raw data represented because not all the latitude 
and longitude coordinates were to be included. For 
latitude and longitude coordinates to be included, their 
extra fields had to match certain criteria. 

The first step was to reduce the record size to the 
absolute bare minimum. I discovered that the latitude 
and longitude coordinates were in 1/120-degree incre- 
ments, so I rounded those floating-point values to 
small integers by multiplying them by 120 and trun- 
cating the remainder. The remaining fields were also 
rounded after multiplication, thus reducing my final 
record size to 9 bytes. 

Unfortunately, there was no way of directly 
importing the data into SQL Server because the flat 
files were in a proprietary data format. So, I wrote a 
Visual Basic .NET 2.0 application that iterated through 
each file and created a pipe-delimited output file that I 
could subsequently import into SQL Server using the 
BULK INSERT statement. I chose to bulk insert the 
records because submitting them through ADO.NET 
would be much slower. 

The final piece of the puzzle was to define a clus- 
tered index on the table consisting of the latitude and 
longitude fields. I hoped this would give me instant 
access to all related data once lookups began. I knew 
that inserting unsorted records into a table with a 
defined clustered index would be slow, but I knew of 
no way to presort the records, so I went for it. 

The import operation started off well. A half 
million records were being imported in a single shot 
every 10 seconds. So, I let the import operation run 
a few hours while I attended to some other tasks. 
When I came back, I discovered that my upload speed 
had drastically decreased. As my table of imported 
values grew, the cost of inserting more records into 
the clustered index became prohibitive to the point 
where my ADO.NET connection was timing out after 


10 minutes—a far cry from the 10-second imports at 
the beginning. Clearly, I couldn’t import the data into 
a table with a defined clustered index. 

Plan B was simple: I would remove the clustered 
index and run the import operation again. My thinking 
was that applying the index after importing the 
data would take much less time. After three days of 
importing, my data had been migrated from thousands 
of flat files into a single SQL Server table. It consisted 
of 16 billion rows and took up 250GB of space. The 
final step was to apply the clustered index. I wrote a 
script and set it running with fingers crossed. 


I tried using BCP to export 
the data from SQL Server 

to a flat file because it’s fast. 
Unfortunately, BCP maintains 
a count of the records 
exported. | say “unfortunately” 
because BCP’s counter variable 
is a 4-byte integer that stops 
on row 2,147,483,647.With 
only an eighth of my database 
exported, | had to look for 
other options. 


About 28 hours later, SQL Server had sucked up 
the 750GB of disk space left and began rolling back the 
index change. After another 24 hours, the rollback had 
completed and I was back to square one. As it turns 
out, applying a clustered index takes a lot of disk space. 
Plus, the entire operation is logged. Not only did my 
database balloon to 500GB but so did my log file. 

I considered several options at this point. The first 
was to define a partitioned table that would make 
subsequent indexing easier. Another was to physically 
split the database between multiple tables. Then, while 
doing some research on BULK INSERT, I came across 
an interesting detail: When bulk inserting data into a 
table with a defined clustered index, you can greatly 
speed things up by supplying already sorted data. 
All you have to do is use BULK INSERT’s ORDER 
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"Love the SQL Toolbelt? 
You're hired!" 


redgate’ 


Forty-five seconds later | was 


parameter to specify an order hint telling SQL Server 
that the incoming data is ordered like the index. So, I 
came up with a new plan: Export the entire table of 
16 billion rows to disk, sort the data, and use BULK 
INSERT with the ORDER parameter to re-import the 
sorted data into a table whose clustered index is already 
defined. 

To export the data from SQL Server to a flat file, I 
tried using BCP because it’s fast. Unfortunately, BCP 
maintains a count of the records exported, which it 
prints in the DOS shell. I say “unfortunately” because 
BCP’s counter variable is a 4-byte integer that stops on 
row 2,147,483,647. With only an eighth of my database 
exported, I had to look for other options. 

I ended up writing a SQL CLR procedure in C# 
that exported my table, row by row, to a flat file on the 
server. I defined my counter variable as a LONG data 
type to avoid the problem I encountered with BCP. I 
ran this procedure, and 19 hours later I had a 141GB 
flat file in binary format ready to be sorted. 

On the Internet, I dis- 
covered Ordinal Technol- 


staring at 181,458 beautiful com), which I could use 
rows of data representing a f” sorting. It promises 


join between 200,000 locations 


to sort a terabyte of data 
in 33 minutes, given suffi- 


and 16 billion coordinates. cient God-like hardware. 
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SQL Server is one lean, NSort was a bit finicky 


mean row-slinging machine. 


to set up, but with help 
from its developer, I man- 
aged to sort my data file 
in 3.5 hours. The developer later told me that had I 
used a larger memory parameter, it could have taken 
only 2.5 hours. 

I was finally ready to import the sorted data back 
into SQL Server. Luckily, I did a BULK INSERT 
test on 10 million rows to make sure my log wouldn't 
explode. It did. As it turns out, by default, BCP loads 
the entire input file into the destination table as a 
single transaction, which means the entire table is fully 
logged. In my case, the log space required was 10 times 
the size of my data. 

With a bit more reading, I discovered I could use 
BULK INSERT’s BATCHSIZE parameter to fix 
the log-space problem. This parameter tells BULK 
INSERT how many records it should commit at a time. 
I picked 100,000 and ran my test again. This time the 
log never exceeded 39MB. A distinct improvement over 
the 2GB it had ballooned to the first time. 

Satisfied that everything was in place and fully tested, 
Istarted the import operation again and settled in for the 
expected 50-hour wait. You can imagine my excitement 
when I returned from a long weekend to discover the 
import had finished successfully. The resulting coordi- 
nate table was the same physical size as before because 


ogies’ NSort (www.ordinal 


a clustered index takes virtually no extra space. 

Next, I created a test location table that consisted 
of 200,000 latitude and longitude coordinates. I added 
new LatIndex and LonIndex fields in the location 
table to represent the integer versions of the original 
floating-point latitude and longitude coordinates. All 
that remained was to join the location table to the coor- 
dinate table on the indexed fields and return the results. 
Forty-five seconds later I was staring at 181,458 beau- 
tiful rows of data representing a join between 200,000 
locations and 16 billion coordinates. I had watched the 
CPU graph on the server while the SELECT statement 
ran and was gratified to see it pinned at 100 percent for 
the entire time, which means that disk I/O was clearly 
not a bottleneck. SQL Server is one lean, mean row- 
slinging machine. 

Here are some distilled tips from my experience: 

* The more data you have in a table with a defined 
clustered index, the slower it becomes to import 
unsorted records into it. At some point, it becomes 
too slow to be practical. 
If you want to export your table to the smallest 
possible file, make it native format. This works best 
with tables containing mostly numeric columns 
because they’re more compactly represented in 
binary fields than character data. If all your data is 
alphanumeric, you won't gain much by exporting it 
in native format. Not allowing nulls in the numeric 
fields can further compact the data. If you allow a 
field to be nullable, the field’s binary representation 
will contain a |-byte prefix indicating how many 
bytes of data will follow. 
e You can’t use BCP for more than 2,147,483,647 
records because the BCP counter variable is a 4-byte 
integer. I wasn’t able to find any reference to this 
on MSDN or the Internet. If your table consists 
of more than 2,147,483,647 records, you'll have 
to export it in chunks or write your own export 
routine. 
Defining a clustered index on a prepopulated 
table takes a lot of disk space. In my test, my log 
exploded to 10 times the original table size before 
completion. 
e When importing a large number of records 
using the BULK INSERT statement, include the 
BATCHSIZE parameter and specify how many 
records to commit at a time. If you don’t include 
this parameter, your entire file is imported as a 
single transaction, which requires a lot of log space. 
The fastest way of getting data into a table with a 
clustered index is to presort the data first. You can 
then import it using the BULK INSERT statement 
with the ORDER parameter. SQL] 
—Dmitry Mnushkin, application developer, 
Renaissance Reinsurance 
InstantDoc ID 98568. 
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Fujitsu PRIMEQUEST® servers offer a cool ani 


Powerful servers don't have to be large and hot. Take the Fujitsu PRIMEQUEST® family of servers: the top of 
the line PRIMEQUEST® 580 server offers 32 Dual-Core Intel® Itanium® processors hosting 2 TB of main 
memory, all in a compact chassis that consumes little more than two floor tiles of space. What's more, it runs 
up to 40 percent cooler than comparable servers due to its advanced airflow and power management. Need 
something cool in a smaller size? The PRIMERGY® TX120 Tower Server features an Intel® Xeon® processor in a 
whisper-quiet chassis less than half the size of standard servers. Now that’s cool. For more information, go to 
http://us.fujitsu.com/computers/GreeniT. 


Minimize your carbon footprint with energy efficient 
Fujitsu PRIMEQUEST® and PRIMERGY?” servers powered by Intel® processors. 
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BX620 S4 TX120 TX300 S3 RX300 S3 PRIMEQUEST PRIMEQUEST 
Blade Servers Tower Server Tower Server Rack Server 520 540/580 
PRIMERGY” Xeon®-based Servers PRIMEQUEST? Itanium®-based Servers 
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THE POSSIBILITIES ARE INFINITE 


Simplify SQL Management 
ANALYZE. BACKUP: (GONTROL/ ACCESS. 


The ABC's of SOL 


Analyzing, Backing-up and Controlling Access 
to data on your SQL Servers are the “building 
blocks” of SQL management. 


The SQL Server Management solution from 
ScriptLogic is comprised of three products: 


e Enterprise Security Reporter“ for SQL Server 
e Security Explorer” for SQL Server 
e LiteSpeed™ for SQL Server. 


Find the right balance with a FREE 30-day 
trial of the SQL Server Management solution. 


www.scriptlogic.com/simplifysql 


— OGIC 


Point, Click, Done!™ 


orporation. All rights reserved. The S jo isa pepneied trademark of ScriptLogic Corporation in the United States and/or other countries. 
ompanies and products mentionec the trademarks of their respective owners. 
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SQL Server Internals Viewer 


4 tools that enable you to see how 


SQL Server stores data 


hen Danny Gould, a SQL Server and .NET 

professional working in London’s financial dis- 
trict, first started learning about SQL Server internals, 
he found that it was difficult and time consuming to see 
what SQL Server was doing internally on an 8K data 
page. The SQL Server Internals Viewer is the result of 
Danny’s effort to take all of the cumbersome elements 
out of finding out how SQL Server stores data. 

The SQL Server Internals Viewer offers four main 
areas of functionality. Each of the following areas 
includes a GUI to help you visualize and navigate the 
internals of SQL Server: 

e Allocation Map—The Allocation Map lets you 
quickly and easily see how all 8K data pages, 
extents, and special purpose pages, such as the 
Global Allocation Map (GAM), Secondary Global 
Allocation Map (SGAM), Differential Change 
Map (DCM), and Bulk Change Map (BCM), 

are allocated to disk. The Allocation Map is the 
default view for the SQL Server Internals Viewer 
and automatically shows GAM and SGAM pages 
in blue and green, respectively, in the right pane of 
the screen. You can use the View menu to choose 
which allocation units you'd like to view or use the 
Database Explorer in the left pane to view the pages 
and extents of a given database or database object 
such as a table. 

Database Explorer—The Database Explorer, which 
is located in the left pane of the SQL Server Inter- 
nals Viewer, lets you select a specific database, table, 
or index to view. You can also drill down to the 


SQL SERVER 
INTERNALS VIEWER 


BENEFITS: The SQL Internals Viewer provides you 
with four interfaces that let you easily monitor how 
data is stored in SQL Server. 


SYSTEM REQUIREMENTS AND NOTES: SQL 
Server 2008 July CTP or SQL Server 2005; .NET 
Framework 2.0; Windows Vista, Windows Server 
2003, Windows XP, Windows 2000 


HOW TO GET IT: You can download the SQL Server 
Internals Viewer from www.sqlinternalsviewer 


„com. 
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Index Allocation Map (IAM), the root page, or the 
first page of an index or table, and then see their 
representation on the Allocation Map. 

Page Viewer—To view the entire contents of an 8K 
data page, including the headers and offset table 
(which are usually visible only via the Database 
Consistency Checker—DBCC—PAGE command), 
use the Page Viewer, which is shown in Figure 1. 
To access the Page Viewer, click a specific page in 
the Allocation Map or enter a page number in the 
Go to Page box at the top of the Allocation Map. 
The Page Viewer shows the page type, the number 
of records on a page, and where each record is 
physically located on the page. It represents data 
on each page as a hexadecimal, but you can see a 
plain-text translation of any human-readable data 
(for example, data stored in data types such as 
NVARCHAR, 
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SQL Edi- 
tor—You 

can use the 
SQL Editor 

to execute a 
query and 
immediately 
see the effect 
of the query in 
the Allocation Map. The SQL Editor displays the 
affected records from the transaction log, as well as 
query results and messages. Pages that are affected 
are shown in the SQL Editor and can then be dis- 
played in the Page Viewer or Allocation Map. 


Figure | 


The Page Viewer 


T encourage you to check out Danny’s blog, which 
includes good information about SQL Server storage 
behavior, tips, and tricks, at sqlblogcasts.com/blogs/ 
danny/default.aspx. And as always, we want to hear 
your feedback on the Tool Time discussion forum 
at sqlforums.windowsitpro.com/web/forum/categories 

SQL] 
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since 1999. 


he report design environment in Business 
Intelligence Development Studio (BIDS) 
for SQL Server 2005 provides properties 
that let you control the location, appearance, and be- 
havior of each element of your report. But the sheer 
number of properties can overwhelm new report 
developers. Although most users quickly become 
familiar with the properties they need to produce 
basic reports, the purpose of many of the remain- 
ing properties remains a mystery. As a result, report 
developers typically rely on the subset of properties 
they understand to try to produce specialized re- 
ports, often with less-than-stellar results. So here’s an 
opportunity to improve your report-designing exper- 
tise by learning about properties that automatically 
refresh and paginate onscreen reports, keep data re- 
gions together on a page when possible, and produce 
multicolumn reports. 


Online Reports 

When you design a report that will be viewed primari- 
ly online, you should consider whether to periodically 
refresh the report on screen, whether to display the re- 
port as a single page, and whether to repeat items on 
each page for reports with multiple pages. To achieve 
those effects, use the AutoRefresh, InteractiveHeight, 
and RepeatWith properties. 

Automatic refresh. For a report that a user will 
open once and manually refresh frequently through- 
out the day as the source data changes (e.g., an opera- 
tional dashboard), you can configure an automatic 
refresh interval. By default, the AutoRefresh property 
is set to 0, which disables the automatic refresh. To 
set a refresh interval for a report, open the report’s 
Properties pane in BIDS (if the Properties pane is 
not visible, press F4). Select Report in the report item 
drop-down list, and in the AutoRefresh property box, 
type a positive integer value to indicate the number of 
seconds between refreshes. 


If the report is designed to toggle the visibility of 
items, those items will return to their original state 
when the report refreshes. For example, if you have 
a row hidden when the document opens, and the user 
clicks an item to show the hidden row, the row will re- 
vert to its hidden state each time the report refreshes. 
Consequently, you should consider limiting the use of 
this feature to reports that display all content on a 
single page without toggling visibility. 

Online pagination. The InteractiveHeight prop- 
erty of the Report item controls the length of a re- 
port when viewed as HTML. If your regional setting 
is U.S., this property defaults to a value of 11", which 
results in online pagination of a report when it ex- 
ceeds a length of approximately 11" on your screen 
and doesn’t contain explicit page breaks. When a re- 
port contains many rows of data, this logical pagina- 
tion improves performance because the first page is 
rendered immediately for viewing, while the remain- 
ing pages render in the background. 

Whether a user has to scroll to view the full length 
of the page depends on the user’s screen size and reso- 
lution and the size of the browser window in which 
the report is viewed. If you want to minimize vertical 
scrolling, you can reduce the InteractiveHeight value, 
but doing so also increases the number of pages in the 
report. Be aware that you might experience inconsis- 
tent online page lengths when you have one or more 
groups defined in a data region, so be sure to test re- 
ports to ensure the maximum page length within the 
report meets your vertical height requirements. 

Alternatively, you might want to eliminate paging 
altogether and display the entire report on a single 
page. For this scenario, simply change the Interactive- 
Height value to 0. Rendering a large report as a single 
page might take longer than a paginated report, so be 
sure to set user expectations accordingly. 

To change the value of the InteractiveHeight 
property, open the Properties pane and select Report 
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Spice up your reports 
for a different look 


in the drop-down list at the top of the pane. Locate 
the InteractiveSize property, click the plus sign to ex- 
pand the property, and type a new value in the Height 
property box. Although the property field is labeled 
Height, the report definition language (RDL) refers 
to it as InteractiveHeight. When specifying a value, 
you can also use units of measurement such as cm for 
centimeters, mm for millimeters, pt for points, or pe 
for picas. You might also notice an InteractiveWidth 
property and be tempted to experiment with it. How- 
ever, the rendering engine currently ignores this prop- 
erty, and changing it has no effect. 

Repeated item with a data region. Suppose you 
have a report that contains a table that spans multiple 
pages when viewed on screen. You probably already 
know that you can set the RepeatOnNewPage prop- 
erty to include the table header or table footer on each 
page on which the table displays, but what if you also 
want to include other items on each page alongside 
the table? Let’s say you want to repeat a text box to 
the right of a table on all pages, for example, but only 
on the pages containing the table. 

You can use the RepeatWith property of three 
types of report items: text box, rectangle, and line. A 
valid value for the RepeatWith property is the name 
of a data region—table, matrix, list, or chart—on 
the report that shares the same parent as the item to 
be repeated. For example, if you have a table in a re- 
port and add a text box next to the table, both the 
table and the text box have the same parent: Body. 
You won't see the text box repeated with the table in 
Preview mode within BIDS, so you must deploy the 
report to the report server to test the results. If you 
specify the RepeatWith property for a rectangle that 
contains a data region, deployment of the report will 
fail. By the way, this property also works for reports 
that you export to PDF. 


Paginated Reports 

Paginated reports are reports exported to PDF or 
TIFF format. When you know a report will usually 
be exported into one of these formats, you'll want to 
think about whether to allow a data region that could 
fit on one page to span two pages or whether to ren- 
der data in multiple columns. 

Single-page data region. When you design reports 
for printing, you can define explicit (physical) page 
breaks by setting page breaks for a data region or for 
groups within the data region. In addition to physi- 
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cal page breaks. The 

first page renders a matrix, then a chart, which is 
followed by part of a table. These report elements fit 
within the defined report size of 8.5" x 11", less the 
space allocated to margins, the page header, and the 
page footer. The rendering engine inserted a logical 
page break and rendered the remainder of the table 
on the second page. 

If you want to keep the table together on one 
page, one option is to define a physical page break 
by setting the table’ PageBreakAtStart property to 
True. But what if you want to keep the table on the 
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Figure 3 
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rendering 
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first page if the user filters the report to include only a 
single category? If you move the table into a list, as I 
describe in the following paragraph, and set its Keep- 
Together value to True, you can achieve your goal. If 
the table doesn’t fit on the same page as the preceding 
items, the rendering engine pushes it to a new page, as 
Figure 2 shows. However, if the table is small enough, 
the rendering engine keeps the table on the same page 
as the preceding items, as in Figure 3. In this case, if 
the table doesn’t fit onto its own page, the rendering 
engine will produce a layout similar to Figure 1 by 
rendering as much of the table as possible on the first 
page and continuing on subsequent pages. 

To fit a table (or a matrix) on a single page, add 


a list to the report layout and drag the table into the 
list. Be sure to set the Location property of the table 
to 0,0 to eliminate white space that might prevent the 
table from rendering correctly on one page. Next, in 
the Properties pane, select the list item you just added 
(e.g., list] if it’s the first list in your report). In the 
DatasetName drop-down list, select the data set you 
used to create the table. Click the Grouping property 
box, click the ellipsis button that appears, and then, in 
the first row in the Expression grid, type “=1” (with- 
out the quotation marks), as Figure 4, page 18, shows. 
Click OK. If necessary, resize the list to the same size 
as the table by clearing the Size property value and 
pressing Enter. The report designer will recalculate 
the Size property based on the list’s contents. Finally, 
set the list’s KeepTogether property to True. 

You must render the report in a paginated format 
to see the result of configuring the list with Keep- 
Together set to True. You won't see the effect of this 
property when you view the report in HTML or Excel 
format. 

Multicolumn report. Sometimes you might want to 
display data in newspaper-style columns (aka snaking 
columns). After you define multiple columns for your 
report, all data regions added to the report will render 
in columns: There is no way to exclude some data re- 
gions from the columnar layout. Consequently, multi- 
column layouts are typically used with a fixed-width 
data region, such as a table or list. 

Let’s look at the report in Figure 5, page 18, which 
displays an employee directory in two columns based 
on data from the AdventureWorksDW database. To 
create this layout, open the Properties pane and se- 
lect Body in the report items drop-down list. Set the 
Columns property to 2. You can also adjust the Col- 
umnSpacing property from its default value of 0.5 to 
increase or decrease the space between columns. Next, 
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TEST your KNOWLEDGE 


SQL SERVER 
Storage Capabilities 


. Ata minimum, a SQL Server database 
consists of which files? 


a) Primary data file and secondary data file 

b) Primary data file and transaction log file 

c) Primary data file, secondary data file, 
and transaction log file 

d) Primary data file, transaction log file, 
and catalog file 


. What is the recommended file extension for 
a primary data file? 
.mdf 
.ndf 
.ldf 
Sql 
e) .ssd 


3. What is the recommended file extension for 
a secondary data file? 
a) .mdf 
b) .ndf 
c) idf 
d) „sal 
e) .ssd 
4. What is the recommended file extension for 
a transaction log file? 
a) .mdf 
b) .ndf 
c) df 
d) .sql 
e) .ssd 
5. Which of the following are supported file 
extensions for a primary data file? 
a) .mdf 
b) .ndf 
c) „data 
d) .sql 
e) bin 


ADVERTISING SUPPLEMENT sponsored by DELL 


6. Filegroups are a useful feature for 
administration and performance tuning. A 
filegroup is made up of one or more data 
files. Which of the following statements 
about SQL Server filegroups is true? 

a) Each filegroup in a database can 
contain up to 8 files. 

b) Each file in a filegroup can be up to 1GB 
in size. 

c) Transaction logs can be recorded in the 
primary data file or any secondary data 
file in the filegroup. 

d) Individual files in a filegroup can have 
different autogrowth settings 

e) Data files can belong to multiple 
databases on the same server. 


7. Is it possible to manually expand the size 
of a data file once it has been created and 


its autogrowth properties have been set? 5) 
a) Yes E 
E-i 

b) No 
nme: / 


8. How would you compact the size of the 
data files in a filegroup? 

a) Run the COMPRESS command. 

b) Run the DBCC SHRINKDATABASE 
command. 

c) Run the SQL Server Surface Area 
Configuration Wizard. 

d) You cannot compact the size of data 
files. 


9. If the ANSI_NULLS option is set to 
OFF, what happens when performing a 
comparison such as ColumnA = NULL? 


a) The result is TRUE if ColumnA is NULL. 
Otherwise the result is FALSE. 

b) The result is FALSE if ColumnA is 
NULL. Otherwise the result is TRUE. 

c) The result is always NULL. 

d) The result is NULL if ColumnA is NULL. 
Otherwise the result is the value of 
ColumnA. 

e) SQL Server raises an exception. 


10. Your departmental database server has 


suffered serious hardware failures (e.g., 
multiple failing disks) and your users have 
noticed some corrupt data in their reports. 
What can you do to attempt to repair the 
corrupt database? 
a) Runsp_sql_repair 
b) Run DBCC CLEANTABLE 
c) Run DBCC CHECKDB 
d) Run DBCC DBREPAIR 
e) Start SQL Server Management Studio 
in Restore Mode, right-click on the 
database, and select the Repair option 


11. With the FILESTREAM data type 


introduced in SQL Server 2005, large 
binary data such as images can be stored 
directly in an NTFS file system and 
accessed directly in T-SQL. True or false? 
a) True 
b) False 


12. How many timestamp columns can you 


add to a table? 

a) 1 

b) 2 

c) 16 

d) There is no limit 


13. What data structure does SQL Server 2005 


use for indexes? 

a) Hash tables 

b) Hash sets 

c) B-trees 

d) Linked list mesh 


14. How many clustered indexes can you 


have on a table? 

a) 1 

b) 2 

c) Unlimited 

d) SQL Server 2005 does not support 
clustered indexes 


15. Which of the following data types cannot 


be used as local variables in T-SQL? 
a) nchar 

b) nvarchar 

c) ntext 

d) text 


16. How big is a bigint? 


a) 32 bits 
b) 64 bits 
c) 256 bits 
d) 1024 bits 


17. Which of the following statements about 


reading XML data from SQL Server 2005 

is true? 

a) XML can only be returned from a 
column of type XML. 

b) The FROM XML clause can only 
be used with character (e.g., nchar, 
nvarchar, ntext) and datetime columns. 

c) The FROM XML clause returns XML as 
a native data type. Returning XML as 
character data is no longer supported. 

d) The native XML data type cannot be 
used for variables in stored procedures. 

e) None of the above 


18. In SQL Server 2005, approximately how 


much data can be stored in varchar, 

nvarchar, and varbinary columns? 

a) 1GB 

b) 2GB 

c) 8GB 

d) 2GB in varchar, 1GB in nvarchar, and 
8GB in varbinary 


q (g, anoqe əy} Jo auoU (2, q (QL po (GL 
e (v1 2 (e, e (z1 q (LL 2 (0L e (6 q (8 sea (Z 
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By delving into some of the 
lesser-known properties of 
BIDS, you can put some punch 
into your everyday reports. 


Employee 
Abbas, Syed 
Pacific Sales Manager 


Abercrombie, Kim 


Abolrous, Hazem 
Quality Assurance Manager 


Ackerman, Pilar 


Shipping and Receiving Supervisor 


Adams, Jay 


Ajenstat, Frangois 
Database Administrator 


Alberts, Amy 
European Sales Manager 


Figure 5 
Multicolumn report 
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Phone 
926-555-0182 


208-555-0114 


Production Technician - V/C80 


869-555-0125 


577-555-0185 


407-555-0165 


Production Technician - WC60 


785-555-0110 


775-555-0164 
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Grouping and Sorting Properties 


General | Fitters | Sorting | Data Output | 
Name: 


fst 1_Details_Group 


Figure 4 


Grouping and Sorting Properties dialog box 


expand the Size property and change Width to a new 
value such as 3, to set the width of each column. When 
you press Enter to confirm these changes, the report 
designer will display a design layout for the first column 
and a placeholder for the second column. Add a table 
to this design layout and set the Location property 
to 0,0. 

In the sample report, the table has a table header 
row, two detail rows per record, two columns, and no 
table footer row. Multiple detail rows let you present 
more information related to each record. To add a 
detail row, select the table, right-click the row handle 
for the detail row (the handle with three bars), and 
click Insert Row Above or Insert Row Below. To dis- 


Employee Phone 
Barber, David 477-555-0132 
Assistant to the Chief Financial Officer 


Barreto de Mattos, Paula 523-555-0175 


Human Resources Manager 
Benshoof, Wanida 
Marketing Assistant 


708-555-0141 


Berg, Karen 
Application Specialist 
Berge, Karen 746-555-0164 
Document Control Assistant 

Berglund, Andreas 181-555-0124 


Quality Assurance Technician 
Bemdt, Matthias 139-555-0120 


Shipping and Receiving Clerk 


play the table header row at the top of each column, 
select the row handle for the table header, then in the 
Properties pane set the RepeatOnNewPage property 


to True. If you omit this step, the rows in each column 
on the first page won’t align properly across the page. 

Multiple detail rows in a table might also cause 
alignment problems in the rendered report. For ex- 
ample, the employee name and phone number might 
be rendered in the last row of the first column and the 
corresponding employee title might be rendered in the 
first row of the second column. To avoid separating a 
record’s detail rows, you can define the grouping crite- 
ria for the table to keep multiple detail rows together 
in the same column, as Figure 5 shows. Select the ta- 
ble, right-click the row handle for the detail row, and 
click Edit Group. In the first row in the Expression 
grid, you can type an expression directly into the box, 
select a field from the data set, or select Expression to 
open the Edit Expression dialog box. For example, 
you can type the expression =Fields! LastName. Value 
+ "," + Fields!FirstName. Value to produce a string 
that displays each employee name by last name fol- 
lowed by first name. 

To view the report’s multiple columns, export it to 
PDF or TIFF format. The HTML and Excel versions 
of the report display a single column only. 


Improve Report Presentation 
By delving into some of the lesser-known properties 
of BIDS, you can put some punch into your everyday 
reports. With very little effort, you can design reports 
for online viewing, for export to a paginated format, 
or other uses. You'll have to make some decisions 
about report layout, but now that P’ve demystified 
several of these properties for you, you're ready to 
bring your reports to the next level. SU 
InstantDoc ID 98853 
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Dashboards 
Create informative dashboards with the new 
dashboard controls. 


Calendar 
View and analyze date based data. 


Word/Excel/PDF exports A 
Take reports, with you in any of. the 
supported formats to: send out, print out, 


. <p 
or just analyze the numbers. 
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Data Dynamics 


Analysis 


DATA DYNAMICS ANALYSIS 


Model Unstructured Data 
Transform relational data into logical 
multi-dimensional objects using a simple 
configuration file. 


Customized Security 

Provide role based security to deliver 
detailed level access to any number of 
users. . 


Gain Deep Insight to Data 

iscover hidden gems in your data 
that might otherwise go 
unnoticed. 


When Data Dynamics provides the tools to add Business Intelligence to your own 


applications, why settle for generic? 


Royalty Free Distribution 
Distribute your application, no matter how big 
your enterprise is, without paying another dime. 


No Maintenance Fees 


Free updates improve your applications over time. 


Free Support 

All Data Dynamics products have free support 
through email, forum or phone for the life of the 
product. 


Z= DATA 
E> DYNAMICS 


www.datadynamics.com 


Tel: 614-895-3142 


$1,499 each 
Save $499 when purchased together 


Industry Leader for over 12 Years 
Data Dynamics has been an innovator and trusted 
leader in the BI market since 1996. 


Supports any database 

Don't settle for tools that only support certain 
data sources, our components support anything 
you can read into your own application. 


Fax:614-899-2943 


"SQL Backup™ makes 
us Exceptional. Don't 
be the exception." 
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Use T-SQL to SOLVE lass ; e 


eens Puzzle 


Practice logic and logic programming 


at the same time 


Harvey, a friend and fellow SQL Server MVP, 

that I found to be a good exercise in both logic 
and T-SQL. After I present the problem, I'll provide 
the model that Roy used to solve the puzzle with a 
computer program. Then, Ill give you both Roy’s and 
my T-SQL implementations of the solution. As always, 
I suggest that you not look at the solution before you 
try to solve the problem on your own. 


i recently received an interesting puzzle from Roy 


The Eight Queens Puzzle 

The puzzle is called the eight queens puzzle, and it 
was first suggested by the chess player Max Bezzel 
more than 150 years ago. The challenge is to arrange 
eight queens on an 8 x 8 chessboard such that no two 
queens can attack each other based on regular chess 
rules—that is, that no two queens are in the same row, 
column, or diagonal. Figure 1 shows one possible 
solution. Your challenge is not to find one possible 
arrangement, but rather to write a T-SQL solution that 
returns all possible arrangements. 


The Logical Solution 

You can use various known algorithms to solve the 
problem. For details about these solutions, go to 
en.wikipedia.org/wiki/eight_queens. 

One of the most inefficient methods you can use 
is based on a naive brute force search algorithm in 
which all possible placements of queens are consid- 
ered. With 64 cells and 8 queens, this method involves 
6418 (281,474,976,710,656) permutations—including 
cases in which a cell contains more than one queen. A 
filtering process eliminates the placements with more 
than one queen appearing in the same cell, as well as 
those in which two queens are attacking each other. 

Even when using a brute-force algorithm, you can 
model the problem to reduce the initial number of 
placements taken into consideration from 6418 to a 
substantially smaller number. Then, you just add the 
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remaining filtering logic on top to isolate only quali- 
fying placements. 

For example, the model that Roy used in his solu- 
tion is to represent each placement of eight queens as 
a string of eight distinct digits. Each digit has one of 
the values in the range | through 8 representing the 
row in the chessboard, whereas the digit’s position in 
the string (from left to right) represents the column in 
the chessboard. So, for example, the solution shown 
in Figure 1 would be expressed with this model as 
17468253—meaning that the placement of the eight 
queens are: row 1, column 1 (la); row 7, column 2 
(7b); row 4, column 3 (4c); row 6, column 4 (6d); row 
8, column 5 (8e); row 2, column 6 (2f); row 5, column 
7 (5g); row 3, column 8 (3h). 

The simplest solution based on this model 
is to initially produce all possible strings 
made of the 8 distinct digits. Surprisingly, this 
isnt a large number of permutations—8! = 
40,320—especially in database terms. The fact 
that each string contains 8 distinct digits, where the digit 
represents a row and a digit position represents a column, 
already restricts the representation to only one queen per 
row and one per column. That’s 
why the initial number of permuta- 
tions is so small. 

What’s left is to filter only the 
permutations in which no two 
queens are placed in the same diag- 
onal. This is achieved by ensuring 
that no two digits in the strings 
satisfy the following predicate: 
The absolute difference between 
the digits is equal to the differ- 
ence between their positions. For 
example, in the string 12473685, 
the representation is such that 


no two queens are in the same row or column. But the 
absolute difference between the second and first digits is 
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f i EIGHT QUEENS 


THE LOGICAL PUZZLE 


Solution to May’s Puzzle: A Cat, a String, and the Earth 
Suppose you lay a string on the ground all around the earth right over 
the equator. The length of the string would be equal to the earth’s equatorial cir- 
cumference—40,075.02 kilometers. Then, suppose you add 1 meter to the string, 
and suspend the string directly above the equator, with an even distance from the 
ground all the way around. Would a cat be able to pass from one hemisphere to 
another below the string? 


Although this puzzle is quite simple, | like it because it’s so counterintuitive. It 
probably seems inconceivable that adding only 1 meter to such a large circumfer- 
ence would make any noticeable difference in the radius, let alone allow a cat to pass 
below the string in the space that was added. But if you do the math, you realize that 
the existing length of the circumference has no significance in determining how the 
radius would be affected when extending the length of the circumference. Instead, 
only the length of the addition is significant. The circumference can be expressed as 
P = 2rir (2 times Pi times the radius). Hence the original radius can be expressed as 
r = P((2m). Adding 1 meter to the existing circumference would change the equation 
to: P + 1m = 2rir, where r represents the new radius. Isolating r, you get: r = (P + 
1m)(2m). Expanding the parentheses, you get: r = P/(2m) + 1m/(2m). Since the origi- 
nal length of the radius was P/(2n), the new radius is 1m/(2m) longer, which is about 
16 centimeters (a bit more than 6 inches). That’s enough for a cat to go under and 
move from one hemisphere to the other. 


June’s Puzzle: Josephus Problem 

The Josephus problem is an ancient puzzle that involves a group of 41 men stand- 
ing in a circle. Going around the circle, every second standing man is executed 
(one skipped, one executed) until only one man is left standing. Assuming that 
the positions are numbered 1 through 41, which position should Josephus (one of 
the men) choose if he could, so that he would be the only one to remain standing? 
Can you generalize the solution for n men? Write a T-SQL solution that returns the 
position based on the input number of men @n. 


InstantDoc ID 98776 
1, and the difference between the positions of the digits 
is 1 as well, meaning that the two queens that those digits 
represent are placed on the same diagonal. Now that 
Tve covered the solutions logic, I'll present a couple of 
T-SQL implementations of the solution. 


The T-SQL Implementation 
Web Listing 1 (www.sqlmag.com, InstantDoc ID 
98775) shows the T-SQL solution that was written by 
Roy. The solution first defines a common table expres- 
sion (CTE) called Eight with a row for each of the eight 
digits in the range | through 8. The column in the CTE 
that holds the digit is called i. The outer query joins 
eight instances of the CTE, each of which represents a 
different column in the chessboard. The names of the 
instances that are being joined are A through F. For 
each column that is processed (i.e., each table in the 
join), a match is produced for each digit (row) in the 
new column that didn’t appear so far in previous col- 
umns. The join condition also filters only cases in which 
the queen represented by the new digit isn’t placed in 
the same diagonal with any other queen that appears in 
previous columns. This is achieved as I explained earlier 
by checking that no cases exist in which the absolute 
difference between the digit values is the same as the 
difference between the digits positions. 

For example, the join predicate between the 
instances A and B is 
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Eight as A 
JOIN Eight as B 
ON B.i <> A.i 
and (B.i - A.i) NOT IN (-1, +1) 


The predicate used to match the result with C is 


JOIN Eight as C 
ON C.i NOT IN (A.i, B.i) 
AND (C.i - A.i) NOT IN (-2, +2) 
AND (C.i - B.i) NOT IN (-1, +1) 


and so on. 

Web Listing 2 shows the T-SQL solution that I 
wrote. My solution’s logic is very similar to Roy’s, 
except that instead of explicitly joining eight instances 
of the numbers table, it uses a recursive CTE, and 
instead of producing eight different columns, it concat- 
enates a character for each digit in a single string. 

There are three predicates in the join performed 
by the recursive member of the CTE. The first simply 
ensures that the recursive query will run as long as the 
string has fewer than 8 digits: 


ON PRV.strlen < 8 


The second predicate uses pattern matching to ensure 
that the new digit doesn’t already appear in the string: 


AND PRV.string NOT LIKE ‘%’ + CAST(NXT.N AS VARCHAR(1)) + ‘%’ 


The last predicate ensures that the new queen isn’t placed 
on the same diagonal as any other queen that is already 
represented in the string (absolute difference between 
digit values equal to difference between digit positions): 


AND NOT EXISTS 
(SELECT * 
FROM Nums AS D 
WHERE D.n <= PRV.strlen 
AND PRV.strlen + 1 - D.n = 
ABS(NXT.n - CAST(SUBSTRING(PRV.string, D.n, 1) 
AS INT))) 


For fun, I created the code in Web Listing 3 to random- 
ly choose one of the 92 possible solutions. The code also 
adds unpivoting and pivoting logic to produce a graph- 
ical depiction of the solution, as Web Table 1 shows. 


A Useful Exercise 
Although the solutions I present to the eight queens 
puzzle aren’t the most efficient possible solutions, they're 
pretty fast—both running in well under a second. This 
puzzle is very suitable for set-based processing with SQL, 
especially with brute-force methods in which you need to 
create permutations. I hope you enjoyed this interesting 
exercise in T-SQL as much as I did. Many thanks to Roy 
Harvey for sending me the puzzle. SQL] 
InstantDoc ID 98775 
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| SECURITY 


Sing 


Service Broker 


Employ this solution as the framework for 
centralized logging of any event category 


ervice Broker, which was a new component in 

SQL Server 2005, provides many new ways of 

thinking about database applications. When 
used in conjunction with Microsoft IIS and other SQL 
Server technologies, it can provide the core architecture 
needed for distributed server-side applications, reliable 
query processing and data collection, large-scale batch 
processing, and data consolidation for client applica- 
tions. Let’s look at data consolidation to see how the 
Service Broker technology can be used to implement 
a security monitoring and logging solution for any 
organization. 


Components of a Service 
Broker Solution 

A Service Broker solution consists of message types, 
contracts, queues, and services. Let’s review each of 
these objects briefly before moving on to the actual 
Service Broker solution we're focused on implementing 
in this article. 

A message type is a message category that's used by 
an application. Defining a message type ensures that 
only messages that can be understood by the processing 
service can be submitted. Message types are similar to 
email messages in that they have a subject (the message 
type itself) and a body (the payload of the message). 
Message types are used in conversations between an 
initiator service and a target service. 

A contract is an agreement between the initiator and 
the target. The contract defines the message type that 
can be sent by the initiator and the valid return that can 
be sent by the target. 

The queue is the storage location for the messages 
that are intended for processing by a specific applica- 
tion. When messages arrive for a specific queue, Service 
Broker is responsible for placing the messages in that 
queue. You can think of the queues as storage bins for 
the messages waiting to be processed. 

Finally, the service is the endpoint that the applica- 
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tion uses to access the queue. The service specifies the 
queue and contracts that can be used to store informa- 
tion in and retrieve information from that queue. A 
service that doesn’t specify a contract can only be an 
initiator; it can’t be a provider or target. 

One additional component is needed: A service 
program is simply an external application or an internal 
stored procedure that’s used to process the messages 
stored in the queues. If you don’t implement a service 
program, the messages will simply sit in the queue and 
never be processed. 


Implementing the Service 
Now, Ill walk you through the implementation of a 
centralized security-incident logging solution. You'll 
learn how to create message types, contracts, queues, 
and services, all of which are needed in the solution. 
You'll also learn how to create simple Active Server 
Pages (ASP) code that calls a stored procedure to place 
information in the security logging queue for auto- 
mated processing. ASP.NET would be the preferred 
solution in a production environment; however, I'll use 
ASP in this example because it lets me keep the code 
simple all the way through. Finally, you'll learn how to 
read messages from the queue and use stored proce- 
dures to process them. 
The first thing you 
need to do is create 
a database for your 
incident logs. Create 
a very basic database 
named seclog by using 
the following simple 
command: 


be improved: 


CREATE DATABASE 


InstantDoc ID 97954 
seclog; —— 


Next you need to 
enable Service Broker 
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For an introduction to Service Broker: 
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LISTING I: Creating the 


SECURITY LOGGING 


for this database, which you can do with the following 
command: 


ALTER DATABASE seclog 
SET ENABLE_BROKER; 


Finally, you must create a table for storing the security 
incidents after they’re processed by the service pro- 
gram. These commands will work for the simple table 
you need to create: 


USE seclog; 

CREATE TABLE incidents 
(id INT PRIMARY KEY IDENTITY(1,1), 
incident varchar(5@)); 


Now that you have the database and table in place, 
you can begin creating the Service Broker objects. 
You'll create them in this order: 

1. Message Types 
2. Contracts 

3. Queues 

4. Services 


The first step is to define a valid message type. The 
security logging solution will need to receive a mes- 
sage that contains the incident descrip- 
tion to be entered in the incidents table. 


Message Types 


CREATE MESSAGE TYPE 
[//seclog/incidents/RequestMessage] 
VALIDATION = WELL_FORMED_XML; 

CREATE MESSAGE TYPE 
[//seclog/incidents/ReplyMessage] 
VALIDATION = WELL_FORMED_XML; 

GO 


LISTING 2: Creating the 


Contract 


CREATE CONTRACT [//seclog/incidents/IncContract] 


([//seclog/incidents/RequestMessage] 
SENT BY INITIATOR, 
[//seclog/incidents/ReplyMessage] 
SENT BY TARGET 
De 

GO 


LISTING 3: Creating the 


Queues and Services 
CREATE QUEUE TargetQueue; 


CREATE SERVICE 
[//seclog/incidents/TargetService] 
ON QUEUE TargetQueue 
([//seclog/incidents/IncContract]) ; 

GO 


CREATE QUEUE InitiatorQueue; 


CREATE SERVICE 
[//seclog/incidents/InitiatorService] 
ON QUEUE InitiatorQueue; 

co 
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Call this message RequestMessage. You 
also need to create another message type 
for the response, although the actual 
implementation won't use it. Call this 
message ReplyMessage. Listing 1 shows 
the code to create the message types. 
Notice that the code validates that 
the payload contains well-formed 
XML. This simply means that 
all nested elements are closed 
properly. It doesn’t require the 
use of an XML schema. That 
would require the statement 
VALIDATION=VALID_XML. 
Now that the message types 
have been created, you can create 
a contract that uses them by 
typing the command in Listing 
2. The contract, named IncCon- 
tract, specifies the message type 
of RequestMessage for sending 
by a conversation initiator. Notice 
that the code specifies a Reply- 
Message that can be sent by the 
target. We wont be using that 
message type in our solution, but 
adding it to the contract allows 
you to use it at a later time with 
little effort. 
Now that you've created the 


message type and the contract, you can create the 
queue for storing the incoming security incidents. 
You also need to create the service, which you can 
accomplish with the first pair of CREATE QUEUE 
and CREATE SERVICE commands in Listing 3. 
Because SQL Server 2005 supports only dialogues 
and not monologues, you have to create another 
queue and service that you won't actually use (i.e., the 
second CREATE QUEUE and CREATE SERVICE 
pair in Listing 3). This technique lets you implement a 
monologue scenario with SQL Server 2005’s dialogue 
structure. 

The Service Broker is now set up, but you also 
need a stored procedure to process messages dropped 
into TargetQueue. Listing 4 shows the code for the 
stored procedure. You'll notice that the code doesn’t 
actually use InitiatorQueue. As I mentioned earlier, 
it exists to set up the conversation appropriately 
because SQL Server 2005 lacks support for mono- 
logues. Ultimately, the stored procedure takes the 
input of an incident description and outputs it to a 
SQL Server table. 

At this point, you have the infrastructure in place 
to begin entering data in the queues; however, I want 
to show you how to create a simple ASP page that 
can be called to place an entry in the queue. Listing 
5 shows some sample code that you could save in an 
ASP file and place on an internal IIS server. You'll 
notice the simple way this ASP page can be called. It 
looks for a parameter called Description to provide 
the information related to the incident. For example, 
incident.asp?Description=some_code would work 
just fine. 


Utilizing the Solution 

With the infrastructure in place, any device can now 
report a security incident with a simple HTTP con- 
nection. For example, assuming the previous ASP code 
was saved to a file named incident.asp at the root of a 
Web server, you could enter the following URL 


http: //webserver/incident.asp?Description= 
Unusual Firewall Activity 


into a Web browser to add a new security incident. 

Many OSs support command-line tools that you 
can use to execute HTTP GET commands (which is 
what a Web browser does when you enter a URL). 
As long as you can run a batch file when a security 
incident occurs on a Windows computer, you can add 
information to the incident log. You need only a client 
computer with a full TCP/IP stack and an application 
that can submit an HTTP request. The valuable Unx- 
Utils collection available at unxutils.sourceforge.net 
includes a wget executable for the Windows command 
line. With this free utility, you can execute the following 
or a similar command: 
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Top 10 Items to consider 


when evaluating 
your database infrastructure; 
where do you currently stand? 


The wind of change is in the air for your corporate databases. Change is coming in the form of SQL Server 
2008, Windows Server 2008, advances in virtualization technology, and changes in server hardware. The 
use of powerful quad-core servers is becoming common. And we'll see new networking technologies such 
as GbE over copper and advances in server storage using SAN, DAS, and iSCSI technology. These changes 
mean that it is possible to deploy more powerful and effective databases and expand database-driven 
applications further into your enterprise to gain competitive business advantages. 

Regardless of your specific need — upgrading an existing database infrastructure, deploying new 
database back-ends, or introducing additional database-driven applications into your environment-to 
maximize your success, you should consider these 10 points before starting any of those projects. 


You should know who needs to access your data because those users can drive decisions that affect your 

project based on their business needs. Do they need new applications? Are their existing applications due 
for an upgrade? Many internal departments, such as Human Resources, Accounting, Manufacturing and Logistics, can 
have database-driven applications that place very specific demands upon the database and its infrastructure. Web- 
based applications that deliver content to external users have their own set of demands, as will web-based catalog 
and order entry systems. 


Who are the consumers of the data contained in your databases? 


your business. 

Because of their impact on the business process, key applications may require upfront planning and 
consideration before you move forward with upgrades of software, infrastructure or hardware. Specific applications 
such as CRM and ERP, along with custom-developed applications, need to be documented and tested against any 
planned changes. 


Identify the data delivery mechanisms required by your 
business processes. 

How are users getting their data? Do they live on the same network as the databases; do they access 
them remotely from within the corporate environment? Is data being delivered to external users or mobile users? 
Understanding the data delivery methodologies is critical to a successful deployment or upgrade of your database 
solutions. All of these choices have their own impact on the design and implementation of both the database software 
and hardware infrastructure. 


Identify your current database storage model. 
Where does your data currently reside? Are you using a storage SAN for your databases? Is there a 
combination of NAS, SAN and DAS storage used within the database hardware infrastructure? 


Are you currently using iSCSI? Do your plans include or require upgrading or redesigning 
your database storage architecture? How fast does your storage subsystem respond 


® 
to I/O requests? What is the nature of your I/O (i.e., OLAP = small random 1/0 or 
Business Intelligence = large block 1/0) if 
UY it pp COM Solutions Cr OSOTI/ SUI 
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Identify the line-of-business database-driven applications used by 


Identify your database data protection model. 

What do you currently do for data protection? Daily incremental backups, continuous data protection? 
What do your current disaster recovery plans look like? What are your database availability requirements 
and are they currently being met? Evaluate your current data protection scheme and the impact changes to your 
database infrastructure will have on the process. 


c 


Check your compliance with database best practices. 

h As you plan the upgrades to your database infrastructure you are provided an excellent opportunity 

to determine your level of compliance with database best practices, not just for generic database 
implementations but also those that apply to specific database-driven solutions such as CRM or ERP. The more 
documentation and information you have at hand on your database design, implementation, and usage the easier 
your upgrade will go. If you don't have these standards in place, plan for their implementation as part of the database 
upgrade/migration process. 


3 


Ensure that your system meets your user service-level agreements. 
Typically, service-level agreements are measured in terms of response time of throughput rates. As user 
workloads grow, increased demands are placed on existing hardware infrastructures. In addition, new 
applications contribute to resource consumption. Therefore, this is your opportunity to evaluate your database storage 
best practices. If you are tracking information such as I/O and usage patterns, storage layout and design, and user 
response time, you'll be able to more accurately determine what you need to do to improve your database performance 
as your database infrastructure grows and changes. 


S 


Evaluate your hardware infrastructure. 

Changes to your database may require changes to the hardware that you run your database on and to your 
networking architecture. With the new software and operating systems able to take better advantage of 
powerful 64-bit processors and huge amounts of memory, you may need to upgrade your database server hardware to 
get the most from your new software implementation. Higher performance networking may also be required, because 
virtualization and server consolidation play a role in your upgrade process. 


S 


Identify current areas of weakness in your database infrastructure. 
Where are your current problem areas? Are there applications or hardware that requires an undue amount 
of IT support? Are you dealing with end-of-life applications or supporting legacy software? Are there 
changes that have to be made just to continue providing your current level of service to your users? What unsolicited 
feedback is IT getting from the users of your database-driven applications? 


S 


Identify areas where you plan to grow the business that will require 
changes/additions to your database infrastructure. 

This is the time to talk with users, from department heads to the folks at the console, and find out where 
their concerns lie. What does the future hold for the line-of-business and mission-critical applications that depend upon 
the database backend? What plans do other departments have that will require specific database support from IT? Are 
there considerations being made to implement large-scale projects, such as database-driven manufacturing systems 

or human resource management tools? Are plans being made that require specific sets of upgrades to the database 
infrastructure, additional storage requirements, geographic distribution of responsibilities, or any other action that will 
directly impact the database and its data delivery mechanisms? 

It is likely that fundamental changes in the business will affect existing and new applications, as well as alter the 
workload which users place on your infrastructure. A good understanding of user transaction/query mix, volume and 
batch windows is critical when planning future infrastructure design. Your analysis should also include consolidation and 
virtualization projects to reduce IT operational costs. 
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LISTING 4: Creating the Stored 
Procedure 

CREATE PROCEDURE uspIncident 

@description sysname AS 

DECLARE @d1g_handle UNIQUEIDENTIFIER; 

DECLARE @msg NVARCHAR (199); 

DECLARE GRecvDlgHandle UNIQUEIDENTIFIER; 


DECLARE @RecvMsg NVARCHAR (109) ; 
DECLARE @RecvMsgName sysname; 


BEGIN TRANSACTION; 


BEGIN DIALOG @d1g_handle 
FROM SERVICE 
[//seclog/incidents/InitiatorService] 
TO SERVICE 
N’//seclog/incidents/TargetService’ 
ON CONTRACT 
[//seclog/incidents/IncContract] 
WITH 
ENCRYPTION = OFF; 


SELECT @msg = 
N’<Incident>’ + @description + ‘</Incident>’; 


SEND ON CONVERSATION @d1g_handle 
MESSAGE TYPE 
[//seclog/incidents/RequestMessage] 
(msg); 


SELECT (msg AS SentRequestMsg; 
COMMIT TRANSACTION; 
BEGIN TRANSACTION; 


RECEIVE TOP(1) 
GRecvDlgHandle = conversation_handle, 
@RecvMsg = message_body, 
@RecvMsgName = message_type_name 
FROM TargetQueue; 
INSERT INTO dbo.incidents (incident) VALUES (@RecvMsg) ; 
SELECT GRecvMsg AS Received 
COMMIT TRANSACTION; 


wget http://webserver.com:80/ 
incident.asp?description= 
495:10.10.10.43 


Take note of how the description is formatted in the 
wget example. Instead of a descriptive phrase such as 
“Unusual Firewall Activity,” there’s a code (405) and 
the IP address of the source device. You'll probably 
want to develop a set of standard codes for security 
events that occur in your organization. For example, 
the code 304 could represent a possible malware infec- 
tion and the code 305 could represent a possible spam 
attack. Mapping wordy attack descriptions to simple 
numeric codes keeps your incident tracking database 
small and the HTTP commands simple. 

That’s it! No need to install a SQL Server client utility, 
because you're communicating with Service Broker via a 
stored procedure that’s executed through an ASP page 
on the IIS server. For many network administrators and 
IT employees, this functionality is phenomenally helpful. 
You could expand it well beyond just security logging; 
it could easily become the framework for centralized 
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SECURITY LOGGING 


LISTING 5: Creating an ASP Page 
to Place an Entry in the Queue 
<HTML> 
<HEAD><TITLE>Incident Addition</TITLE></HEAD> 
<BODY> 
<% 
set conn = server.createobject(“ADODB. Connection”) 


conn.open “PROVIDER=SQLOLEDB; DATA SOURCE=WIN2k3R2EE; 
UID=asp_app; PWD=asp_pass ;DATABASE=seclog” 


Description = Request.QueryString(“Description”) 


strSQL = “exec ‘” & Description & “’” 
conn.Execute strSQL 


response.write(“Incident reported containing the 
description: “ & Description) 

conn.close 

%> 

</BODY> 

</HTML> 


logging of any event categories that you must report. It 
could even save you thousands of dollars in client agent 
licenses for third-party applications. Of course, you'd 
need the proper SQL Server 2005 or later licensing and 
the sweat and tears to fully develop your solution. 

You could enhance this logging solution to provide 
automatic notification of specified events based on 
risk-level IDs or the point on the network at which the 
incident occurred. This could be accomplished by using 
the Notification Services architecture or a custom-built 
solution that simply monitors the incidents table for 
the relevant entries. Additionally, you could create 
modules so that more devices could communicate with 
the database. For example, a solution could be built 
that receives SNMP alerts and forwards them to the 
security incident database when applicable. 

The most obvious enhancement to this solution 
would be to take even greater advantage of the Service 
Broker component. You could implement an activa- 
tion scheme that fires a stored procedure as soon as an 
item is placed in TargetQueue. This would mean that 
the automatic processing currently employed would 
be removed from the uspIncident stored procedure. 
The new stored procedure could still add the data to 
the incidents table, but it could additionally look for 
specific codes and then take predetermined actions 
such as launching external scripts that would shut 
down sections of the network or individual computers 
that might be problematic. Once you have the infra- 
structure in place, the hooks and branches that can be 
implemented are practically endless. 

The ultimate goal of this article is to provide you 
with a real-world example of using Service Broker to 
solve a problem. I hope it helps you to begin thinking 
of even more and better ways to use this technology. 
Td love to hear from you if you have ideas about how 
to expand this example. SQL] 
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You could 
expand this 
functionality 
well beyond 
just security 
logging; it 
could easily 
become the 
framework for 
centralized 
logging of 

any event 
categories 
that you must 
report. 
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Data Warehousing: 


Degenerate 


Dimensions 


Mapping control numbers from the OLTP 
database to the data warehouse 


degenerate dimension is not one that lacks 
A moral structure or integrity. Instead, a 

degenerate dimension is a dimension that 
doesn’t exist as a table but is represented in the data 
warehouse. 

Data warehouse dimensional design requires you 
to include control documents such as invoices, orders, 
and warranties. Each of these control documents has 
a control number such as the invoice number, the 
order number, or the serial number of the item under 
warranty. Degenerate dimensions are simply control 
numbers that are stored in the fact table of a data ware- 
house. These control numbers look like keys, but they 
don’t act like keys; they have no associated dimension 
to join with. Control numbers provide a way to identify 
which line items in the fact table were generated as a 
part of the same order or invoice. Let’s take a look at 
how to map control numbers from the OLTP database 
to the fact table in the data warehouse and associate 
them with each line item. 


How to Map Control Numbers 
to the Fact Table 

Control numbers originate in the OLTP database. For 
instance, in an ordering system, the order data is stored 
in two OLTP tables, the Order Header table and the 
Order Detail table. These two tables have a one-to- 
many (1:M), parent-to-child relationship, as shown on 
the left side of Figure 1, page 28. For a detailed expla- 
nation of this two-level structure, see “Metamodel for 
Retail Sales,” May 2001, InstantDoc ID 20409. The 
Order Header (parent) table contains data such as the 
date and time of the order, who took the order, who 
placed the order, where the order was placed, and the 
control number. The Order Detail (child) table contains 
the line items, such as what was ordered, the number 
of each item ordered, and the associated costs. When 
you transfer these tables from the OLTP database to a 
data warehouse, these two tables become a single fact 


SQL Server Magazine * www.sqimag.com 


table that’s populated with keys and measures (i.e., 
attributes that you can add up), such as the number 
of items ordered and the associated costs (e.g., the 
cost of the items ordered, discounts, various taxes, 
shipping and handling by item), as shown on the right 
side of Figure 1. 

In Figure 1, the dashed lines between the OLTP 
database and the data warehouse indicate how the 
attributes in the OLTP database are mapped to the fact 
table in the data warehouse. The two red mapping lines 
identify the control numbers from the Order Header 
and the Order Detail tables. Note that the control num- 
bers are mapped to the fact table, but unlike the other 
attributes from the OLTP database, they’re not treated 
as keys once they’re in the data warehouse. 

The granularity of the fact table should typically 
be at the line-item level; every record in the fact table 
will be a line item from the actual order, which is rep- 
resented by a record in the OLTP Order Detail table. 
This level of granularity enables you to query the data 
warehouse in multiple ways. For example, you can 
query which product sold best on a Monday morning 
between the hours of 9 a.m. and 10 A.M. But do you 
really need control numbers, such as the OrderID or 
the OrderDetailID, to respond to such queries? 

The short answer is “no,” but you do need these 
control numbers in the data warehouse. Without these 
control numbers, you wouldn’t be able to reassemble 
the order as it was originally placed, track the history 
of the order (e.g., when it was picked, packed, shipped, 
delivered, returned, refunded), or even do something 
as simple as calculate the average number of items per 
order. The control numbers are valuable, so you can’t 
throw them away. 


Handling Control Numbers 

in the Data Warehouse 

So how do you handle these control numbers in the 
data warehouse? As Figure | shows, strip the identifiers 
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DATA WAREHOUSING: DEGENERATE DIMENSIONS 


Figure | 


Mapping attributes from 
the OLTP database to 
the data warehouse 


from the Order Header table and include them as keys 
in the Order fact table. These identifiers are OrderDate, 
which maps to TimeKEY in the fact table; StoreID, 
which maps to StoreKEY; and SalesPersonID, which 
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maps to Employee- 
KEY. The customer 
identifiers, BillToCus- 
tomerID and Ship- 
ToCustomerID, both 
map to fact table keys 
(i.e., BillToCustomer- 
KEY and ShipTo- 
CustomerKEY) that 
are associated with the 
Customer dimension. 
Each of these keys is 
joined to an associ- 
ated Customer dimen- 
sion, which is shown 
in abbreviated form in 
Figure 1. 

Next, you'll want 
to follow the same 
procedure for the 
Order Detail table. 
Then map ProductID 
to ProductKEY in the 
fact table, Quantity to 
Quantity, and Price to 
Price. What’s left over 
from the OLTP tables 
are the control num- 


bers, OrderID and OrderDetailID. 

Now place the control numbers in the fact table 
and associate them with every line item that’s a part 
of each order. I recommend positioning the control 
numbers immediately below the fact table key columns, 
and just above the typical numeric facts (e.g., Quantity), 
which are also called measures. Although the control 
numbers will look like foreign keys, they won't have 
any corresponding dimensions to join with, thus the 
name degenerate. 

Tf, by chance, there are other attributes in the OLTP 
Order tables that don’t break down logically into 
dimensions in the data warehouse, and these attributes 
are associated with a specific order or the details of 
an order, then create a real dimension that contains 
the order number, the order detail number, these attri- 
butes, and a surrogate dimension key. In that case, the 
dimension is no longer degenerate; it's just another set 
of foreign keys in the fact table. 


Efficiently Store Control 
Numbers in the Data 
Warehouse 
Control numbers serve a very important function in 
the OLTP databases in which they originate. When 
the OLTP database is imported into a data warehouse, 
the control numbers also have to be carried forward; 
otherwise, you risk losing meaningful information. 
The most efficient and effective way to handle these 
control numbers is to embed them in a fact table as a 
degenerate dimension. SQL] 
InstantDoc ID 98722 
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T- clustering experience has changed 
with each new version of Windows 
and SQL Server. And the release of 
Windows Server 2008 is no exception. As you 
know, the new release of Windows has come 
before the release of SQL Server 2008. This 
Essential Guide will explore what that means 
for both SQL Server 2005 and the upcoming 
release of SQL Server 2008. 


What's New in 

Windows Server 2008 Clustering? 
Before delving into how SQL Server (both 
2005 and 2008) will be supported with the new 
version of clustering in Windows Server 2008, 
| will first highlight some of the new features 
and improvements in a Windows clustering 
implementation. There are many, and the most 
important are discussed below. 


e The first thing to note is that the terminology for 
clustering is changing. Instead of using a server 
cluster to denote the Windows portion of clustering, 
Windows now uses failover clustering terminology. 
So going forward in this article, | will make 
sure to clarify whether I’m referring to a feature 
of Windows failover clustering or the failover 
clustering feature of SQL Server. 


e Before Windows Server 2008, solutions had to 
appear either in the old Hardware Compatibility 
List or the newer Windows Server Catalog as fully 
qualified cluster solutions or multi-cluster storage 
devices. That is no longer the case with Windows 
Server 2008. The new operating system ships with 
a Cluster Validation tool (see Figure 1) that you 
are required to run before clustering the nodes. 
The validation tool is a GUl-based utility that 
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Figure 1: Sample failover cluster validation report 


verifies various aspects of your cluster, including 
network, storage, the underlying operating systems 
of each node, and so on. If your configuration 
does not pass validation, it is not a supported 
cluster solution. If your configuration passes, you 
can then cluster. However, you should not treat 
this as an open license to throw wildly differing 
hardware together to make a production cluster. | 
recommend that you stick to the same vendor and 
same server type where possible, with the only 
deviations being processor and memory (and even 
those should be the same or bigger, if you are 
setting up a dedicated failover node). 


e Before Windows Server 2008, Cluster 
Administrator was a standalone utility. Cluster 
Administrator is now an MMC-based snap-in. 


e Windows Server 2008 failover clustering allows 
nodes and clustered resources to get their IP 
addresses via DHCP. 


e Windows Server 2008 failover clustering does 
not require all nodes to be on the same network 
subnet. 


e The private network (heartbeat) has changed from 
multicast to unicast. 


e Failover clustering now has a Volume Shadow 
Copy Service (VSS) provider to enable easier 
backups of cluster nodes. 


e Disks and LUNs larger than 2TB are now 
supported with GUID partition table (GPT) disks. 
This is good news for clustered implementations 
that will support VLDBs. Master Boot Record 
(MBR) disks are also supported. 


* To improve security, a domain account is no longer 

required to configure a Windows Server 2008 
failover cluster. The account is 
now an account local to each 
node, with the same privileges 
that the domain-based account 
would have had. 


e Windows Server 2008 has four 
quorum types. Windows Server 
2003 had two: disk-based and 
Majority Node Set. These still 
exist in Windows Server 2008, 
with slightly different names 
(No Majority — Disk Only and 
Node Majority, respectively). 
The new quorum types are 
Node and Disk Majority and 
Node and File Share Majority. 
The goal of the new quorum 
types in Windows Sever 2008 
is to remove disk as a single 
point of failure. Both Node and 


Disk Majority and Node and File Share Majority are 
similar. They work on a concept similar to majority node 
set, but both the nodes and the disk or file share get 
votes—as long as the majority is up, the cluster will be up. 


e Failover clustering has a maintenance mode that allows 
you to perform operations on, or repair, a node for a 
period of time and that prevents accidental failovers. 


What’s New in 

SQL Server 2008 Failover Clustering? 
SQL Server 2008 is not a release focused on vast 
changes in the availability architecture. Failover 
clustering is basically the same sans the new installation 
process. This means that you still can’t cluster services 
such as SQL Server Integration Services (SSIS) and 
SQL Server Reporting Services (SSRS) via the SQL 
Server installation process. 


It is important to note that the first release of SQL Server 
2008 that included the failover clustering feature was 
CTP 6, and that is what this section is based upon. 
Some of the information may change by the time of RTM 
later this year, so consider this section a first look at 
failover clustering in SQL Server 2008. 


The biggest thing you will have to worry about is 
installing SQL Server 2008. Some of the changes 
include: 


e The install process is completely redesigned, so you 
will want to spend some time getting familiar with it. 


e Installation is done on a per instance, per node basis. 
What does that mean? When you install a SQL Server 
2008 failover clustering instance, you first install the 
instance on a single node. Then, on each subsequent 
node independently of that initial install, you run the 
installation to add the node to the already created cluster 
instance. So if you had a four-node, six-instance failover 
cluster, you would run the installer (either GUl-based 
or scripted) 24 times. | can hear many of you groaning 
(and | did, too), but if you think about it, it does enhance 
serviceability (e.g., for rolling upgrades or patches), 
since each node is essentially independent for patching. 
| recommend that if you’re going to install multiple- 
instance failover clusters that you become very familiar 
with script. To that point, a new feature (coming post- 
CTP 6 and currently known as “cluster-prep”), provides 
a preparation process similar to SysPrep. Look for this 
feature in the release candidate and final build. 


e During the initial checks that happen before the rest of 
the install, the setup process leverages the Windows 
Server 2008 Cluster Validate results, among other 
things, to ensure that SQL Server 2008 failover clusters 
are not built on a bad foundation. 


e You can now select multiple drive letters during the 
installation process. Previous versions of SQL Server 
failover clustering only allowed you to select a single 


drive during installation. This is obviously a vast 
improvement over the past, where you had to add 
additional drive letters or mount points after SQL Server 
was installed. 


e There is the new concept of an instance ID. This will be 
used to help identify the installation path of your SQL 
Server 2008 instances. For example, with SQL Server 
2005 the path for an instance is something like C:\ 
Program Files\Microsoft SQL Server\MSSQL.N, where 
N is the number of the instance. The directory structure 
changes yet again with SQL Server 2008. It will be 
something like C:\Program Files\Microsoft SQL Server\ 
MSSQL10.InstancelD, where InstancelD is a unique 
identifier you choose during the installation process. 
You should develop a standard for how you name your 
instances and understand how that translates into the 
instance ID. 


e Along with Windows Server 2008, SQL Server 2008 
failover clustering supports IPv6 and allows you to use 
it during the installation process. 


Deploying SQL Server failover clustering 
on Windows Server 2008 failover clusters 
Deploying Windows itself is going to be a slightly 
different experience, but covering that process in detail 
requires more space than this article allows. And many 
DBAs will not install Windows (in my experience, some 
don't even install SQL Server). This section will focus 

on the currently known considerations for deploying a 
clustered SQL Server instance (either 2005 or 2008) on 
a Windows Server 2008 failover cluster. 


e Make sure that your IT organization is ready to support 
Windows Server 2008. This means that there should 
be a standard build of Windows Server 2008 that 
has all the necessary tools, software, and drivers to 
be supported. For example, ensure that you have 
Windows Server 2008-compatible versions of antivirus, 
monitoring software, HBA drivers, NIC drivers, and so 
on. You get no benefit from a platform if no one can 
support it. 


Windows Server 2008 failover clustering requires 

the use of an Active Directory-based domain. If your 
company is still using older domains that are not Active 
Directory-based, you will not be able to implement 
clustered instances of SQL Server using Windows 
Server 2008 until your infrastructure is upgraded. 


There is no straight upgrade path for the operating 
system upgrade from either Windows 2000 Server 

or Windows Server 2003. Unlike prior versions of 
Windows, where you could do a rolling upgrade of your 
operating system, with Windows Server 2008 you will 
need to purchase and deploy all new hardware for your 
Windows Server 2008 failover cluster. This also means 
you will need a method to migrate your databases 

from cluster to cluster. Whether that is hardware-based 
(such as SAN clones/snapshots/BCVs), backup and 


Do not wait until it is too late 
to start considering your eventual 
Windows Server 2008 deployments 


for SQL Server 2005 
or SQL Server 2008. 


restore, or detach and attach, this is an important 
consideration in the move to Windows Server 2008. 


e Another area of complexity during the migration 
will be that you cannot have two instances with the 
same name and IP address in the same domain 
simultaneously. Your migrations/upgrades will need to 
be carefully thought out and planned. 


* SQL Server 2000 at any Service Pack level will not 
be supported with Windows Server 2008 failover 
clusters. For those of you who still have a significant 
investment in SQL Server 2000 (and many of you do), 
you really need to start thinking about migrating those 
applications, databases, and instances to SQL Server 
2005 or SQL Server 2008. 


e The minimum supported SQL Server 2005 version that 
can be clustered is SQL Server 2005 with SQL Server 
2005 Service Pack 2. Many of you are already there, 
but if you are not, you will want to start testing your 
applications against Service Pack 2 to ensure that you 
will have no issues. 


e Like SQL Server 2005 and SQL Server 2000, you can 
technically install SQL Server 2005 and SQL Server 
2008 side-by-side in the same cluster, but you should 
test to see if that configuration is what you want for the 
long term. 


e As of CTP 6, the basic requirements for rights for the 
SQL Server-based service accounts (SQL Server, SQL 
Server Agent, and so on) are the same for SQL Server 
2008 as they were with SQL Server 2005. However, 
the security model for SQL Server 2005 and SQL 
Server 2008 will be slightly different. As many of you 
know, SQL Server 2005 introduced the requirement for 
domain-based groups needed for SQL Server cluster 
deployments. This requirement will be the same for 
deploying SQL Server 2005 on a Windows Server 2008 
failover cluster. However, you should use separate 
service accounts for SQL Server 2008 to ensure that 
your instances are secured separately from SQL Server 
2005 installations. SQL Server 2008 will not require 
domain groups with Windows Server 2008. SQL Server 
2008 will use Service SIDs. 


* The Core variation of Windows Server 2008 is not 
supported for clustered SQL Server implementations. 


e Both SQL Server 2005 and SQL Server 2008 will still 
require a minimum of one drive letter per clustered 
instance under Windows Server 2008. Mount points are 
only supported under drive letters. 


SQL Server 2005 and SQL Server 2008 do not support 
the new OR functionality of Windows Server 2008 
failover clustering described earlier. Similarly, SQL 
Server 2005 and SQL Server 2008 do not support the 
ability to utilize different subnets for cluster nodes, even 
though Windows Server 2008 supports it. SQL Server 
requires that all nodes are on the same subnet. | know 
this will annoy some of you, but this is the way it works. 


While IP addresses for SQL Server instances via DHCP 
are supported, you should choose static IP addresses 
for greater predictability, unless you are only resolving 
instances by name. SQL Server 2005 and SQL Server 
2008 will support up to the maximum of 16 nodes in a 
failover cluster with Enterprise Edition (Standard Edition 
still is limited to two nodes in SQL Server 2008). You 
will be more limited by resources such as drive letters 
as to how many nodes and instances you can and 
should deploy. The trend is that four- and five-node 
clusters are becoming more commonplace; however, 
using more nodes than that is not common for SQL 
Server deployments at the time of writing this article. 


Summary 

Windows Server 2008 brings a tide of change to 
clustered deployments of SQL Server. The operating 
system was rearchitected from the ground up, and it 
improves the experience at the Windows level over 
previous versions. With that comes the inevitable 
change to how SQL Server will be deployed in a 
clustered environment. Do not wait until it is too late to 
start considering your eventual Windows Server 2008 
deployments for SQL Server 2005 or SQL Server 2008 — 
start planning now. 
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include the book Pro SQL Server 2005 High Availability (Apress, 
2007) and various articles for SQL Server Magazine. Before 
striking out on his own in 2007, he most recently worked for both 
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Microsoft on various projects. He can be reached via his Web 
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FOR YOUR 
BUSINESS 
INTELLIGENCE 


Dell™ Microsoft® and AMD can help reduce the cost and complexity of 
Business Intelligence. Our systems are quick to install, easy to manage, 
and built with standard components that work with what you already 
use —all for up to 72% less per terabyte than the competition? 


SIMPLIFY YOUR IT AT 


AMD 


Smarter Choice 


*72% claim based upon a comparison of list prices of typical Business Intelligence offerings from leading hardware 
manufacturers versus Dell/Microsoft combined offerings. Benchmarked systems configured with 4-5 TB of data storage, 
database application software, and Business Intelligence analytic software. 


USING 


AANRY 3 CLR UD Ts 
QL Server 2008 


VarBinaryComp compresses and 
decompresses binary data up to 2GB in size 


ew in SQL Server 2008 is support for large 
R user-defined types (UDTs). Were going 

to build a Common Language Runtime 
(CLR) UDT that allows us to store, compress, and 
decompress binary data (up to 2GB in size) inside 
SQL Server 2008. This technique works only in SQL 
Server 2008 because only SQL Server 2008 allows 
CLR UDTs larger than 8KB. The UDT we'll build, 
called VarBinaryComp, accepts both uncompressed 
data (which it will compress before storing) and 
compressed data as input. It also has properties to 
determine the length of the compressed and uncom- 
pressed data. To implement this UDT, we’ll leverage 
the compression/decompression logic covered in “Zip 
Your Data,” InstantDoc ID 49065. For background 
on compression fundamentals, you might want to 
review this article and its accompanying sample code 
before proceeding. 


CLR UDTs 
A little refresher course on CLR UDTs might be 
helpful as well. A CLR UDT is a way of extending 
SQL Server’s existing type system by using a Micro- 
soft .NET Framework language such as C# or Visual 
Basic .NET, and it enables you to store CLR objects 
inside a SQL Server database. Don’t confuse a CLR 
UDT with a user-defined data type, which simply acts 
as an alias over a built-in data type. In addition to 
storage, a CLR UDT can provide properties and/or 
methods that allow for instantiation/manipulation of 
the underlying data. 

CLR UDTs aren't intended to turn SQL Server into 
a full-blown object-oriented database. See the Web- 
exclusive sidebar “CLR UDT Caveats” (www.sqlmag 
„com, InstantDoc ID 98304) for more information 
about the proper place of CLR UDTs and methods for 
overcoming common objections to using them. 

An easy way to understand the potential power 
of a CLR UDT is to examine how to use the built-in 
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XML data type. Introduced in SQL Server 2005, the 
XML data type lets you store, retrieve, and manipulate 
data in XML form. Let's take a look at the code that I 
wrote in Web Listing 1 (www.sqlmag.com, InstantDoc 
ID 98305) as an example. After connecting to the 
AdventureWorks sample database, the code declares a 
variable (@XML) of type XML. The code then issues 
a relational query (by using the FOR XML AUTO 
clause) and stores the results in the XML variable. 
Next, the code uses the .value method of the XML 
data type to retrieve a single SQL type value from the 
XML data. Finally, the code demonstrates how the 
XML data type can be used as a column definition in 
a table (i.e., the HumanResources.JobCandidate table). 
The XML data type is pretty powerful stuff—and a 
great example of how the SQL Server Product Group 
has extended SQL Server’s relational model. We have 
the capability to do the same thing! 


Developing a CLR UDT 

The process of developing a CLR UDT in SQL Server 

consists of the following three steps: 

1. Code and build the assembly that defines 
the CLR UDT. To do so, you can use 
any language supported by the .NET 
Framework CLR that produces verifi- 
able code, including C# or Visual Basic .NET. The 
data is exposed as fields and properties of a .NET 
Framework class or structure, and behaviors are 
defined by methods 
of the class or 
structure. 

2. Register the 
assembly. You can 
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USING LARGE CLR UDTS 
LISTING |: VarBinaryComp Implementation Excerpt 


we 


A 


Imports System 

Imports System.Data 

Imports System.Data.SqIClient 
Imports System.Data.SqlTypes 
Imports Microsoft.SqlServer.Server 


Imports System.10 


A): Declaration of the VarBinaryComp Structure. 
* We implement the IBinarySerialize interface because of the fact that this 
‘ structure uses “UserDefined” serialization 
<SerializableQ> _ 
<Microsoft.SqIServer.Server.SqlUserDefinedType(Format.UserDefined, MaxByteSize:=-1)> _ 
Public Structure VarBinaryComp 
Implements INullable, Microsoft.SqlServer.Server.IBinarySerialize 


‘UDTs must implement the ToString and Parse function to allow conversion 


E to/from strings. With binary data, these functions provide little practical use. 


Public Overrides Function ToString() As String 
Try 
If m_Bytes Is Nothing Then Return String.Empty 
‘Convert Compressed Byte Array to String 
Return System.Text.Encoding.Unicode.GetString(m_Bytes.Value) 


Catch ex As Exception 
Throw ex 
End Try 
End Function 


© ‘Required property used to determine nullability 
Public ReadOnly Property IsNulI() As Boolean Implements INullable.IsNul1 
Get 
If m_Bytes Is Nothing Then 
Return True 
End If 
Return False 
End Get 
End Property 


‘Required function used to return a new, null instance of the UDT. 
Public Shared ReadOnly Property Nul1(Q) As VarBinaryComp 
Get 
Return New VarBinaryComp(False, Nothing, 9) 
End Get 
End Property 


Public ReadOnly Property UnCompressedLength() As SqlInt32 
Get 
Return m_UnCompressedLength 
End Get 
End Property 


(3) “This custom shared function instantiates a new instance of VarBinaryComp, 
£ using uncompressed data as input. 
Public Shared Function ParseVarBinaryU(ByVal sUncompressedBytes As SqlBytes) As 
VarBinaryComp 
Try 
‘If we are passed in an empty byte array, return a Null instance 
If sUncompressedBytes Is Nothing Then 
Return VarBinaryComp.Nul1 
End If 
‘Otherwise, call the overloaded constructor to return a new instance 
Return New VarBinaryComp(False, sUncompressedBytes, Ø) 
Catch ex As Exception 
Throw ex 
End Try 
End Function 


@ ‘This custom shared function instantiates a new instance of VarBinaryComp, 
E using data that is already compressed data as input. 
Public Shared Function ParseVarBinaryC(ByVal sCompressedBytes As SqlBytes, ByVal 
iUnCompressedLength As SqlInt32) As VarBinaryComp 
Try 
If sCompressedBytes.Value Is Nothing Then 
Return VarBinaryComp.Nul1 
End If 


Return New VarBinaryComp(True, sCompressedBytes, iUnCompressedLength) 
Catch ex As Exception 
Throw ex 
End Try 
End Function 


© ‘UDTs must implement the ToString and Parse function to allow conversion 
Public Shared Function Parse(ByVal s As SqlString) As VarBinaryComp 
Try 
If s.IsNull Then 
Return VarBinaryComp.Nul1 
End If 


30 June 2008 


‘Example of a custom property (in this case, the uncompressed length of the binary data) 


to/from strings. With binary data, these functions typically are of little value. 


or by using the T-SQL 
CREATE ASSEMBLY state- 
ment, which copies the assembly 
containing the class or structure 
into a database. Note that I 
developed the VarBinaryComp 
CLR UDT by using Visual 
Studio 2005 and the July CTP 
build of SQL Server 2008. At 
the time, deployment to SQL 
Server 2008 through Visual 
Studio was not yet supported. 
Therefore, we'll be using the 
T-SQL deployment method 
(along with an alternative 
debugging strategy), which I 
explain later. Also, don’t forget 
that you need to first enable 
CLR integration in the SQL 
Server instance you're working 
with before you can use the 
CLR UDT. You can enable 
CLR integration by using the 
SQL Server Surface Area Con- 
figuration Tool. 

3. Create the CLR UDT in SQL 
Server. After you’ve loaded an 
assembly into a host database, 
you use the T-SQL CREATE 
TYPE statement to create a 
CLR UDT and expose the 
members of the class/structure 
as members of the CLR UDT. 
CLR UDTs exist only in the 
context of one database and, 
once registered, have no depen- 
dencies on the external files 
from which they were created. 
(To learn how to use a CLR 
UDT across databases, refer to 
the SQL Server Books Online 
topic “Using User-defined Types 
Across Databases” at msdn2 


-microsoft.com/en-us/library/ 
ms178069.aspx. 


The VarBinary 

Comp UDT 

Lets now look at the VarBinary- 
Comp UDT inside Visual Studio 
2005. After downloading the 
sample code by going to www. 
sglmag.com, entering InstantDoc 
ID 98305, and clicking the .zip file 
under Download the Code, open 
the SqlSvr_CompUDTsln file. 
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The UDT, VarBina- LISTING 2: Declaring and Instantiating a Variable 
ryComp, is imple- of Type VarBinaryComp 


mented as a structure 

in the excerpt of the = __ 
VarBinaryComp. 
vb file that Listing 
1 shows. (Web 
Listing 2 shows the complete VarBinaryComp 
„vb file.) At callout A in Listing 1, the dec- 
laration uses the Microsoft.SqlServer 
Server.SqlUserDefinedType attribute to identify this 
structure as a CLR UDT. Within the attribute, two 
property values have been set. The MaxByteSize prop- 
erty has been set to -1, which lets the UDT store more 
than the prior 8KB limit—up to 2GB. 

The Format property has been set to UserDe- 
fined. Format determines how the UDT is serial- 
ized, as Native or as UserDefined. Format.Native, 
as the name implies, uses native SQL Server binary 
serialization. While very fast, this format is restricted 
to fixed-length, value-type data types (e.g., int, date- 
time). Because VarBinaryComp deals with variable- 
length binary data, I chose Format.UserDefined. This 
format gives us complete flexibility over the serializa- 
tion process, but it also requires that we implement 
the IBinarySerialize interface. The IBinarySerialize 
interface consists of two methods—Write and Read— 
which appear at the end of Web Listing 2. Note these 
methods will be called each time you write or read any 
UDT property. 

A CLR UDT must also implement the INullable 
interface (which consists of the read-only property 
IsNull) and a shared read-only method named Null (as 
callout C in Listing 1 shows). This Null method needs 
to instantiate and return a new instance of a UDT with 
its IsNull property set to true. 

Finally, a UDT CLR must also implement the 
ToString and Parse methods to convert data to and 
from a string value. As the name implies, the ToString 
method (at callout B in Listing 1) converts the UDT to 
a string representation. For the VarBinaryComp UDT, 
this method isn’t useful because viewing binary data as 
a string provides little practical value. In fact, during 
my testing, I discovered that this method raises an error 
if the underlying data exceeds 8,000 bytes. Therefore, if 
you need to view the binary data as a string (or convert 
it to a string), you might want to consider adding a 
custom method (e.g., ToNvarchmax) to overcome this 
size limitation. 

The Parse shared method (at callout G in Listing 1) 
allows a string to be converted into a new instance of 
a UDT. Although the Parse method isn’t the focus of 
this article, I did use it (which in T-SQL code involves 
setting a UDT variable equal to a String value/variable) 
to compress varchar, nvarchar(), and nvarchar(max) 
data types. 
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-- Now Declare a variable of type VarBinaryComp.  Instantiate it using the 
Shared (i.e. static) method ParseVarBinaryU 

DECLARE @VarBC [VarBinaryComp] ; 

SET @VarBC = VarBinaryComp: : ParseVarBinaryU(@Document) ; 


Optionally, a CLR UDT can implement additional 
methods and properties. VarBinaryComp implements 
several methods/properties specific to compressing/ 
uncompressing data. Let’s review a couple of these. 
ParseVarBinaryU (at callout E in Listing 1) is a shared 
function that creates a new instance of VarBinary- 
Comp—using uncompressed binary data as input. 
Notice that within ParseVarBinaryU, the code calls 
an “overloaded constructor” (i.e., the New function) 
to take care of the actual instantiation. It’s here that 
the code invokes the algorithm that compresses the 
binary data. 

ParseVarBinaryC (at callout F in Listing 1) is 
another shared function—but one that creates a new 
instance of VarBinaryComp with data that has already 
been compressed. (This could be useful when an appli- 
cation compresses the data before sending it to SQL 
Server.) VarBinaryComp also implements a read-only 
property, UnCompressedLength, that represents the 
uncompressed length of the binary data (at callout D 
in Listing 1). 


Compile and Deploy 
VarBinaryComp 

After opening the SqlSrv_CompUDT\SqISrv_ 
CompUDT.sln file in Visual Studio, compile the 
project by selecting Build SqlSvr_CompUDT from 
the Build menu. Then, open the Deployment.sql 
file (located in the SqlSrv_CompUDT\Test Scripts 
folder) inside SQL Server Management Studio. After 
changing the Directory Path variable to match your 
machine settings, run Deployment.sql to deploy the 
assembly to the AdventureWorks database. Note that 
in this script I have deployed both the .dll and .pdb files 
to allow debugging of this UDT. 

To verify your deployment, open the Test.sql file 
(also located in the SqlSrv_CompUDT\Test Scripts 
folder) inside Management Studio. Run this script 
to see how to use the VarBinaryComp UDT inside 
of T-SQL Code. Web Listing 3 shows the complete 
Test.sql script, but let me draw your attention to a few 
points of interest. First, in Listing 2, look at how a 
variable of type VarBinaryComp 1s declared and then 
instantiated. 

As discussed previously, the shared method Parse- 
VarBinaryU lets us instantiate the UDT by using 
uncompressed binary data (i.e., a varbinary(max) 
data type) as input. Notice the double-colon nota- 
tion—this is how you call a UDT shared method 
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LISTING 3: Calling the Shared Method Compress 


tion. Open the solution file (WinApp_TestUDT\ 


WinApp_TestUDT.sln) in Visual Studio, and 


-- If you just wanted to use the Shared Method to Compress Binary data, you 
== can do the following. 

DECLARE @DocumentCompressed varbinary(max) ; 

SET @DocumentCompressed = VarBinaryComp: : Compress (GDocument) ; 


you'll see a simple Windows form application 
(which Figure 1 shows). The application lets you 


insert and select records that contain binary data 


1 Pogo er Nicene 5 ergles Mecrorott 1 SHB 
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representing a file. You can let the UDT 
handle all the compression/decompres- 
sion logic, or you can optionally compress 
a file before inserting it. 

Before running this application, open 
the CreateTableAndSps.sgl file (located 
in the WinApp_TestUDT\Test Scripts 


Figure | 


Application to 
demonstrate use of 
VarBinaryComp 
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from T-SQL code. Just to be clear, shared methods 
can be called without first instantiating the UDT. For 
example, the Test.sql code excerpt in Listing 3 calls 
the shared method Compress to compress binary 
data and store the results in a variable of the stan- 
dard type varbinary(max). In this manner, a UDT 
can also be used as a “collection” of useful, related 
functions. 


Debugging a CLR UDT 

Typically, debugging a CLR UDT is a fairly simple 

process that you can initiate within Visual Studio 2005 

by selecting Start Debugging from the Debug menu. 

But as I mentioned, this capability isn’t supported with 

the July CTP of SQL Server 2008. Here’s an alternative 

approach you can use: 

1. Manually deploy your assembly to SQL Server 

2008 by using T-SQL (as we have already done). 

2. With the SqlSvr_CompUDT.sIn solution open in 
Visual Studio 2005, select Attach to Process from 
the Debug menu. Select the Show processes from all 
users check box, click sqlservr.exe in the Available 
Processes list box, and click Attach. 

. Set breakpoints as desired in your UDT code, then 
run T-SQL code, a client application, or some 
other program that will make use of the UDT. In 
this case, you can run the SqlSrv_CompUDT\Test 
.sql script from within Management Studio. 

4. When you're finished debugging, select Stop 

Debugging from the Debug menu. 


we 


The Integrated Sample 
T've also built a sample client that demonstrates 
VarBinaryComp being used as part of an applica- 


folder) inside Management Studio; run 
a this script to create a table which uses 
the Var-BinaryComp UDT as one of its 
columns. The script will also create 
two stored procedures that the application 
uses to insert and select data. The stored 
procedure used to insert data takes as 
input a standard varbinary(max) data 
type; likewise, the stored procedure 
used to select data returns a standard 
varbinary(max) data type. By default, the compression/ 
decompression logic happens inside these stored 
procedures—thereby making the UDT “transparent” 
to the application. 


Where to Go from Here 

The VarBinaryComp CLR UDT is fully functional 

and ready to help you reduce the size of binary data 

stored inside SQL Server 2008. However, you might 
want to consider implementing a number of potential 
enhancements: 

e Alter the CLR UDT to be compliant with the 
popular Zip file format (described at www.winzip 
.com/aes_info.htm#zip-format) 

e Allow for different compression algorithms. In “Zip 
Your Data,” I demonstrated how you could use 
either of the two compression algorithms that ship 
with version 2.0 of .NET Framework. You might 
want to re-enable this capability and/or add support 
for third-party algorithms. 

e Add encryption. By using capabilities built into 
.NET Framework, you can easily encrypt binary 
data by making a “second pass” over the data after 
it’s been compressed (and then decrypt the data 
before it’s decompressed). 


Support for large CLR UDTs really opens the door 
for developers to extend the built-in type system in 
SQL Server 2008. Although the potential for misuse 
exists, I believe this new capability—when used prop- 
erly—provides a great degree of flexibility for storing 
rich data types (and accompanying logic) inside the 
relational database. SQU 

InstantDoc ID 98305. 
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Sharpen Your Basic 
SQL Server Skills 


Learn about authentication modes and the 
names and uses of installed system databases 


Q. What’s the difference between the Windows Authen- 
tication Mode and the Mixed Mode of security 
authentication? 

A: The Windows Authentication Mode of security 
authentication allows a user to connect to SQL Server 
with a Windows user account. In this authentication 
mode, SQL Server receives the user’s login name 
and password and validates them with the Windows 
OS's logon name and password. After the user's login 
credentials are validated, SQL Server grants the user 
access. Windows Authentication uses the Kerberos 
security protocol and supports many standard group 
polices related to login name and password. This 
makes Windows Authentication Mode the most secure 
method to connect to SQL Server. 

Mixed Mode is used when a user can connect to 
the SQL Server machine via Windows Authentica- 
tion Mode or SQL Server Authentication. When 
you run a legacy application or use a non-trusted 
connection, SQL Server Authentication is automati- 
cally used instead of Windows Authentication. SQL 
Server receives the user’s login name and password 
and validates them with the previously created login 
name and password stored in SQL Server. After the 
user's credentials are validated, SQL Server grants the 
user access. Mixed Mode authentication is convenient 
for multi-platform systems that contain non-trusted 
connections and legacy applications. 

When you install SQL Server, Windows Authen- 
tication Mode is the default; SQL Server Authentica- 
tion is disabled. For information about enabling SQL 
Server Authentication, see the Microsoft article “How 
to: Change Server Authentication Mode” (msdn2 


-microsoft.com/en-us/library/ms188670.aspx). 


Q: How many system databases are installed with SQL 
Server 2005, and how are they used? 

A: There are five system databases in a default instal- 
lation of SQL Server 2005: master, resource, model, 
msdb, and tempdb. You can install additional system 
databases if you perform additional setup. 

The master database contains all the system-level 
information (e.g., login accounts, server-wide database 
configurations, information about system and user 
databases, file groups), plus initialization information 
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required to start SQL Server. A full backup must 
be performed regularly. Even a small change to this 
sensitive data can cause SQL Server to behave in an 
unexpected manner. Therefore, the master database has 
many modification restrictions (e.g., changing owner, 
modifying file groups, dropping databases, making 
a database read only). In SQL Server 2005 all the 
system objects have been moved to the new resource 
database. 

The resource database contains all the system objects 
in SQL Server, such as user and system tables, views, trig- 
gers and functions, stored procedures, and constraints. 
In SQL Server all system objects logically belong to the 
system schema of each database and can be accessed 
with sys.objects, even though the system objects physi- 
cally reside in the resource database. The resource data- 
base contains only system data; it doesn’t contain any 
user data. It isn’t possible to take a SQL Server backup 
of the resource database; instead, use a manual file 
system backup process. The resource database location 
depends on the location of the master database. If you 
move the master database, you must also move the 
resource database. If you move the resource database 
without moving the master database, SQL Server 2005 
won't start. The resource database is read-only, and you 
can't see it in SQL Server Management Studio (SSMS). 
This database facilitates SQL Server quick version 
upgrades and easy service pack rollbacks. 

The model database is a template for all new data- 
bases that you'll create. So any changes to this database 
will show up in subsequently created databases. Back up 
this database regularly. 

The msdb database contains scheduling alerts and 
scheduled jobs for all user databases. This database is 
used by log shipping jobs, service brokers, database 
mail, and other services. You need to back up this 
database. 

The tempdb database contains every temporary 
object (e.g., temporary tables, temporary stored proce- 
dures, table variables) and many other internal system 
objects of SQL Server (e.g., worktables, deleted and 
inserted tables from triggers). You don’t need to back 
up tempdb because it’s recreated each time SQL Server 
is started. Ee 
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Get the most out SQL Reporting Services! 


If you need to add new data visualization and digital dashboard 
capabilities to your reports within SQL Server 2005 Reporting 
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Fundamentals 


ost companies that are in business for a 
i while eventually encounter a disastrous 

event that has the potential to put the 
company out of business. And every company that 
uses databases will at some point experience a data- 
base crash. A database backup is a copy of the data, 
structures, and security objects contained within a 
database. Each database should be backed up on its 
own schedule, based on the number of write transac- 
tions that occur each day. To minimize data loss when 
disaster hits, you must back up your databases—all of 
them. And to ensure that your backups are good, you 
should test them by using them in a restore operation. 
At the very least, you need to have copies of your data- 
bases that you can quickly restore, and you need to be 
comfortable with the restore operation itself. 

After people, data is a company’s most valuable 
asset. Your responsibility as a DBA is to ensure that 
your company’s data is safe—that is, that you have a 
copy of the data that you can reinstate even if the entire 
data center is reduced to rubble. Database backups are 
the simplest, most cost-effective means of safeguarding 
your company’s data. 

Sunday 


Monday Tuesday 


Minimize data loss 


server hosts. High-availability schemes are crucial 
when your data and systems must be available all 
the time. But even high-availability systems can be 
affected by fire, flood, and earthquake. You still need 
to do backups. Not just anyone in a company can 
do a database backup. For information about who 
should be backing up your databases, see the Web 
sidebar “Who Can Do Backups?” www.sqlmag.com, 
InstantDoc ID 98372. 

How often you should back up a database depends 
on how long you have to restore it. In general, the more 
often you back up a database and the type of backup 
you take, the shorter the restore time. You can tailor 
backups and restores for each database. The kind of 
backup you decide to use will depend on the size of the 
database and the amount of transactional activity. The 
three most common types of backups are full, log, and 
differential. (For information about recovery models, 
see the Web sidebar “Database Recovery Models,” 
www.sqlmag.com, InstantDoc ID 98373; for informa- 
tion about SQL Server backup commands, see the Web 
Sidebar “Standard Backup Commands,” www.sqlmag 
„com, InstantDoc ID 98374.) 
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Full 
backup 
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Full 
backup 


Don’t be lulled into a false sense of security by 
a high-availability system you might have recently 
purchased. If you’ve virtualized and consolidated, 
then you might have actually increased your risk. Life 
was easy when you ran just one SQL Server instance 
per computer—but 10 SQL Server instances running 
on virtual machines (VMs) can all come crashing 
down when the physical box fails. If you can afford 
the additional investment, you can avoid a disaster 
of such large proportions by clustering the virtual 
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Full Backups 

The full backup strategy is the easiest to understand 
and implement. At the end of every business day, or 
during whatever time window you've allocated for 
database backups, you simply perform a full backup 
of the database, as Figure 1 illustrates. You perform no 
separate log backups, and you don’t have to remember 
any special parameters. Backup file management is 
simple, because you only need to manage the full 
backup file. In addition, restoring from a full backup 


Job schedule for a full- 
backup strategy 
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Figure 2 


Job schedule for a full- 
plus-log backup strategy 
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Restore a 

SQL Server 2000 
Database to 
SQL Server 2005 


You can’t make a 
backup of a SQL 

Server 2005 database, 
then restore it as a SQL 
Server 2000 database— 
but you can do the 
opposite. A SQL Server 
2000 database that’s 
restored or attached 

to an instance of SQL 
Server 2005 will still be 
a version 8.0 database 
(SQL Server 2005 data- 
bases are 

version 9.0). 
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backup 


backup 


is extremely easy because you have only one full backup 
file to apply. Full backups are especially useful in orga- 
nizations with a limited or relatively new IT staff. 

A full backup works best for a “small” database— 
which you can define as a database whose full backup 
can be completed in the time allowed. When SQL 
Server performs a full database backup, it first backs 
up all the extents on the hard disk (an extent is eight 
contiguous pages, with each page being 8K in size). 
Then, SQL Server backs up the transaction log so that 
any user changes made during the database backup are 
also captured in the full backup file. 

If you're performing only full backups, you might 
lose some data in the event of a system crash—specif- 
ically, any changes made since the last full backup. If 
your database is updated infrequently, such as by high- 
speed bulk operations, then you can plan full backups 
to run only immediately after the bulk data modifica- 
tions, and your data should be protected. 

Full backups aren’t appropriate for production 
systems that have anything other than a few transac- 
tions. After you use the full backup strategy to restore 
a database, you must redo any transactions or bulk 
data loads that were applied to the database after the 
backup. If your most current backup file is damaged, 
you need to use the next previous full backup to restore 
the database—and you'll have to ensure that all trans- 
actions applied to the database after that backup are 
manually redone. 

To perform a full backup, run the following code: 


BACKUP DATABASE AdventureWorks 

TO DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_Ful 1DbBkup.bak’ 
WITH INIT, NAME = ‘AdventureWorks Full 
Db backup’, 
DESCRIPTION = ‘AdventureWorks Full 
Database Backup’ 


DISK is the destination for the backup file. You can 
back up to disk or to tape; in this case, you're backing 
up to the hard disk. Make sure the folder you'll use as 
a store for the backup files exists before you begin. In 
most cases, backing up to hard disk is faster but more 
expensive than backing up to tape. For an extra level of 
protection, you can first back up to hard disk, then per- 
form a file-level backup to write the database backup 
file to tape. WITH INIT specifies that the backup file 
should be overwritten. This method works well as long 
as a Windows backup occurs after every database 
backup. NAME is the name you give the backup file, 


backup backup 


which can contain as many as 128 characters. If you 
don’t specify a name, it’s left blank. DESCRIPTION 
is a longer, friendly description that makes identifying 
the file weeks or months after the backup was made a 
relatively straightforward process. 

To perform a full restore of the database, run the 
following code: 


RESTORE DATABASE AdventureWorks 
FROM DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_Ful 1DbBkup. BAK’ 
WITH RECOVERY, REPLACE 


WITH RECOVERY instructs the restore operation 
to roll back any uncommitted transactions that might 
be on the transaction log and leave the database in 
operational mode, ready to resume work. REPLACE 
means overwrite any existing file with the same name. 
For more information, see the Web sidebar “Replacing 
a Database,” www.sqlmag.com, InstantDoc ID 98375. 

If you use the full backup strategy, you need to 
monitor the size of the transaction log. A full backup 
doesn’t truncate (remove inactive entries from) the 
transaction log. If you perform only full database 
backups, you should follow the full backup with a log 
backup using the TRUNCATE_ONLY option, as the 
following code shows: 


BACKUP LOG AdventureWorks 
WITH TRUNCATE_ONLY 


TRUNCATE_ONLY doesn’t back up the transaction 
log; it simply forces SQL Server to take a checkpoint, 
which then truncates the log, getting rid of inactive 
entries and shrinking the size of the log file. Because 
this option will be dropped in future releases of SQL 
Server, you might instead use the simple recovery model 
and let SQL Server automatically rid the transaction 
log of inactive entries. 


Full Plus Log Backups 

If you can’t tolerate any data loss on restore, use the 
full plus log backup strategy. This method guarantees 
against data loss and works well for databases that 
are updated frequently. Although using this strategy 
increases your database’s complexity and maintenance, 
the total amount of time necessary to back up the 
database will decrease. 

Figure 2 shows a sample schedule for a full plus log 
backup strategy—a weekly full backup on Sunday, a 
transaction log backup on Monday, a second log backup 
on Tuesday, and a log backup every day of the week 
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until the following Sunday rolls around, when you take 
a new full backup. A log backup includes all data and 
structures that have changed since the last log backup. 
Thus, each log backup in this schedule contains only the 
changes for that day: Monday' log backup contains all 
of Monday’s changes, Tuesday's log backup contains all 
of Tuesday's changes, and so on. 

Unless you specify otherwise, inactive records in the 
log are “removed” (marked for overwriting) at the end of 
a transaction log backup by default. You can add NO_ 
TRUNCATE or COPY_ONLY to the BACKUP LOG 
command, which will leave the log records as they were 
before the log backup began. However, you shouldn’t 
use this option unless you have a lot of experience. 

SQL Server 2005 lets you perform a tail-log 
backup, which is a backup taken after a database 
crash—assuming that the transaction log file isn’t cor- 
rupt. A tail-log backup captures the last few transac- 
tions since the last transaction log backup. (For a more 
complete explanation of tail-log backups, see the Web 
sidebar “What Is a Tail-Log Backup?” www.sglmag 

_com, InstantDoc ID 98376.) 

Using the full recovery model provides relatively 
straightforward recovery and is preferable when using 
the full plus log backup strategy. You simply restore 
the full backup followed by each of the transaction log 
backups in chronological order (1.e., the order in which 
they were taken), finishing with a restore of the tail- 
log backup. This strategy works well for production 
systems, especially those that are mostly transactional 
with few bulk operations. 

If your database has regular bulk operations (e.g., 
bulk inserts done daily), then you might want to use 
the bulk-logged recovery model. Because individual 
records included in the bulk operation aren’t logged, 
this approach eliminates the overhead of SQL Server 
writing to the transaction log. Although you might 
achieve a performance advantage during the time the 
bulk operations are running, you run the risk of losing 
data on a restore operation if you don’t have the source 
data needed to rerun the bulk operations. If you're 
using a simple recovery model, you can’t perform a 
log backup because this model causes the log file to be 
truncated on checkpoint. 

To perform a full plus log backup, you must first 
back up the entire database, as follows: 


BACKUP DATABASE AdventureWorks 
TO DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_Ful 1DbBkup. bak’ 
WITH INIT, NAME = ‘AdventureWorks Full 
Db backup’, 
DESCRIPTION = ‘AdventureWorks Full 
Database Backup’ 


Then run the following code for the transaction log 
backup: 
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BACKUP AND RESTORE 


BACKUP LOG AdventureWorks 
TO DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_TlogBkup.bak’ 
WITH NOINIT, NAME = ‘AdventureWorks 
Translog backup’, 
DESCRIPTION = ‘AdventureWorks 
Transaction Log Backup’, NOFORMAT 


WITH NOINIT specifies that the backup files should 
be appended to the backup media, whether you're 
using disk or tape. In this case, all the transaction 
log backups will be written to the same disk file, one 
after another, in sequence. NOFORMAT instructs the 
backup process to preserve any header information 
that might already be on the backup disk headers. 
This behavior is the default, so you don’t necessarily 
need to use this option, although doing so is helpful 
for self-documentation. 
To restore a database that’s using a full or full plus 
log backup strategy, perform the following steps. 
1. If the database is online, restrict database access 
by switching the database availability option (in 
the property window) to RESTRICTED_USER, 
which allows only members of the db_owner fixed 
database role and members of the dbcreator and 
sysadmin fixed server roles to access the database. 
2. Perform a tail-log backup (new to SQL Server 
2005). 
3. Fix the problem that caused the database to crash. 
4. Restore using the full backup with the NO- 
RECOVERY option. 
5. Apply each of the transaction log backups with the 
NORECOVERY option, if applicable. 
6. Restore the tail-log backup with the RECOVERY 
option. 


The code to perform a tail-log backup is as 
follows: 


BACKUP LOG AdventureWorks 
TO DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_TaillogBkup.bak’ 
WITH NORECOVERY 


To perform a complete restore from a full backup, 
you must first restore the files for the database as 
follows: 


RESTORE DATABASE AdventureWorks 
FROM DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_Ful 1DbBkup. bak’ 
WITH NORECOVERY 


NORECOVERY instructs the recovery operation to 
leave partial transactions intact rather than roll them 
back. The transaction log backup(s) that follow the 
full database restore contain additional data that 
complete these partial transactions. NORECOVERY 
leaves the database in a nonoperational state. The full 
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Figure 3 


Job schedule for a 
differential backup 
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strategy 


backup backup 


restore is immediately followed by a restore of each of 
the transaction log backups in chronological order, all 
using NORECOVERY as follows: 


RESTORE LOG Adventureworks 
FROM DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_TlogBkup.bak’ 
WITH NORECOVERY 


Finally, apply the tail-log backup with the 
RECOVERY option, as follows: 


RESTORE LOG AdventureWorks 
FROM DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_Tail logBkup. bak’ 
WITH RECOVERY 


The full plus log backup strategy isn’t bulletproof. 
If one of the transaction log backups is corrupted, 
then you can restore only to a point before the cor- 
rupted log backup. For instance, suppose you run a 
weekly full backup on Sunday and transaction log 
backups on Monday through Saturday. If Tuesday’s 
log backup is corrupt, you can restore only through 
Monday’s backup. All of Tuesday’s work would be 
missing because of the corrupted log backup, so 
you wouldn't want to risk violating data integrity by 
applying Wednesday’s transactions to Monday’s data. 
Even the tail-log backup would be useless. 


Full Plus Differential Backups 

If you want an extra level of insurance, consider adding 
differential backups to your full backup scheme instead 
of doing just log backups. This strategy is good for a 
transactional database that has many record inserts 
and updates and that can sustain little to no data loss 
on restore and recovery, as well as for administrators 
who place a priority on fast recovery. 

A differential backup is cumulative; it includes all 
data and structures that have changed since the last full 
backup, regardless of when that last full backup was 
made, or how many previous differential backups have 
been run. Suppose you perform a full backup on Sunday 
and differential backups on subsequent days of the week, 
as Figure 3 illustrates. Monday’s differential backup will 
contain all of Monday’s changes, Tuesday's differential 
backup will contain all of Monday’s plus Tuesday’s 
changes, Wednesday’s differential backup will contain all 
of Monday’s plus Tuesday’s plus Wednesday’s changes, 
and so on. 

Restoring a differential backup generally takes less 
time than restoring a full plus log backup, because 


Differential | Differential | Differential 
backup 


backup backup 


restoring just one differential backup takes less time 
than restoring a string of log backups. To perform a 
differential backup, run the following code: 


BACKUP DATABASE AdventureWorks 
TO DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_DiffDbBkup. bak’ 
WITH INIT, DIFFERENTIAL, NAME = 
‘AdventureWorks Diff Db backup’, 
DESCRIPTION = ‘AdventureWorks 
Differential Database Backup’ 


To restore a database using the full plus differential 
strategy, perform the following steps. 

1. If the database is online, restrict database access 
by switching the database availability option (in 
the property window) to RESTRICTED_USER, 
which allows only members of the db_owner fixed 
database role and members of the dbcreator and 
sysadmin fixed server roles to access the database. 

2. Perform a tail-log backup (new to SQL Server 2005). 

. Fix the problem that caused the database to crash. 

4. Restore using the full backup with the NO- 
RECOVERY option. 

5. Apply the latest differential backup with the NO- 
RECOVERY option. 

6. Apply the tail-log backup with the RECOVERY 
option. 


Ww 


After restoring the full backup, do a differential 
restore as follows: 


RESTORE DATABASE AdventureWorks 
FROM DISK = ‘E:\SQLdata\BACKUPS\ 
AdventureWorks_DiffDbBkup. bak’ 
WITH NORECOVERY 


Then, restore the tail-log backup with the RECOVERY 
option, as discussed previously. 

The differential backup provides a level of insur- 
ance you cant achieve when performing only log 
backups. If the most current differential backup is 
corrupted, you can still restore from the previous dif- 
ferential and maintain full data integrity. 


Combining Strategies 

If redoing transactions for the missing day isn’t prac- 
tical, you can combine full, differential, and multiple 
daily log backups. For example, you could perform 
a full backup on Sunday and differential backups 
on subsequent nights, plus log backups on Monday 
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through Saturday mornings and afternoons, as 
Figure 4 shows. If the database came down on Friday 
night and needed to be restored, but Thursday’s differ- 
ential backup was corrupted, you could use Wednes- 
day’s differential for the restore, then restore the log 
backups taken on Thursday and Friday. The database 
would then be restored to the point of failure. For more 
information, see the Web sidebar “How Do I Recover 
to a Point in Time?” _www.sqlmag.com, InstantDoc 
ID 98377. 

To minimize the risk of data loss, you should con- 
sider mixing and matching full, log, and differential 
backups, even though doing so will complicate your 
backup strategy and backup file management. You 
also need to realistically evaluate how much data 
loss you can live with following a database crash and 
restore. Using a full or bulk-logged recovery model 
rather than a simple recovery model means more trans- 
action log file activity and a larger (and longer) log file 
backup, but the benefit is less lost data. 


Alternative Backup Strategies 

SQL Server backups aren’t limited to full, log, and dif- 
ferential. More advanced options include the file or file 
group backup strategy, the partial backup strategy, and 
the copy-only backup strategy. For information about 
these strategies, see the Web sidebar “Alternative Backup 
Strategies,” www.sqlmag.com, InstantDoc ID 98956. 


Database Access During 
Backups and Restores 

SQL Server backups are an online process; the data 
stored in SQL Server is highly available during this 
time. Operations such as INSERT, UPDATE, and 
DELETE are allowed, as are SELECT statements. 
However, operations that would modify the under- 
lying table or file space architecture, such as ALTER 
DATABASE, ADD FILE, or SHRINKFILE, can't 
be done while the backup is running. If auto-shrink 
is turned on in your database configuration file, you 
might experience a conflict during a backup operation. 
For example, if auto-shrink tries to initiate while the 
backup is running, both operations might fail. Which- 
ever operation starts first will set a lock on the file; the 
second operation will wait for that lock to be released 
before it begins. If the first operation releases the lock, 
then the second operation will commence. If the lock 


Monday 
Night 
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times out on the first operation, the second operation 
will fail. This development seems unfair to the second 
operation, which has to wait for the lock timeout, only 
to then fail. However, the rationale is that the second 
operation’s viability is based on the first operation suc- 
ceeding. If the first operation fails, the second opera- 
tion doesn’t need to proceed. To prevent this problem, 
consider turning off auto- 
shrink before performing 
a backup. 

Most SQL Server 
restores are offline opera- 
tions; users can’t access 
the database while it’s 
being restored. If you're 
using SQL Server 2005 
Enterprise Edition with 
the full recovery model, 
partial restores and 
restores of nonprimary 
file groups are online 
operations by default. 
The parts of the database 
that aren’t being restored, such as read-only file groups, 
are accessible throughout the entire restore operation. 
Read/write file groups are available except when they're 
pulled offline to be restored. This option is immensely 
valuable for large databases that are heavily accessed 
24 x 7 x 365. For more information, see SQL Server 
2005 BOL, “Performing Online Restores,” msdn2 
-microsoft.com/en-us/library/ms188671.aspx, as well as 
the Web sidebar “Why Can't My Database Restore Be 
an Online Operation?” www.sqlmag.com, InstantDoc 
ID 98378. 


management. 


Putting It All Together 
Data has become so central to businesses’ success that 
safeguarding it is mandatory. Backups are therefore 
crucial to maintain a healthy back office environment. 
The first step toward building business and database 
continuity is to make regular database backups and test 
them to ensure they can be successfully restored. When 
you create a new database, you should write the backup 
and restore scripts at the same time. SQL Server gives 
you many backup and restore options, which you can 
customize to meet the needs of each database. EA 
InstantDoc ID 98371 
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Figure 4 


Job schedule for a combined full-, differential-, and log-backup strategy 
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To minimize the risk of 

data loss, you should consider 
mixing and matching full, 

log, and differential backups, 
even though doing so will 
complicate your backup 
strategy and backup file 


Differential 


When you are looking 
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look to our new IT and Y 
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Look at the new Windows IT Pro 
n Resource Directory — 
ere developer and IT professionals 
can go to research and compare 
products in different areas of interest. 
This fully searchable database 
encompasses listings for technology 
products, training and consulting 
services. As part of the Windows IT Pro 
network, this comprehensive directory 
connects the IT community with the 
latest products and services from 
leading edge organizations. 


Finding the best products and services 
for your company should start with the 
Technology Resource Directory, a 
comprehensive and local resource tool 
that makes it quick and easy to locate 
the products, services and training 
you need. 
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Who's Hogging My Server? 


Pondering SQL Server performance metrics 


[l recently spoke with someone who was thinking 
about creating multiple SQL Server instances on his 
server and placing one database in each instance. You 
might wonder why he would want to do such a thing. 
He was concerned about system performance and 
thought that by placing one database in each instance, 
he could monitor them independently so that he'd 
know which ones were using the most resources. 
Seems like a reasonable DBA request these days, 
right? In fact, it’s one of the most common scenarios I see 
among DBAs who have to account for separate applica- 
tions that share a common database server—particularly 
when server consolidation is high on the list of corporate 
priorities. Certainly, there are some situations in which 
you should use multiple SQL Server instances, but you 
don’t typically see performance monitoring among those 
situations. Suffice it to say, there’s a much easier and 
cleaner way to get the kind of metrics that multiple SQL 
Server instances might provide—by using SQL Server 
2005's built-in dynamic management views (DMVs). 


Cost and Performance 
Questions 

Two questions arise from situations involving multiple 
applications on a common database server, and they 
involve cost and performance. 

What's the cost? First, you need to determine each 
application’s ratio of server usage. This metric arises 
mainly from corporate policies in which different depart- 
ments must pay an internal fee for their use (really, their 


LISTING I: Query to Identify and 


Aggregate Query_stats DMV Information 


USE master 
SELECT a.[value] AS [dbid] 


ISNULL(DB_NAME(CONVERT(INT,a.[value])), Resource’) AS [DB Name] 


SUM(qs.[execution_count]) AS [Counts] 
SUM(qs.[total_physical_reads]) AS [Total Physical Reads] 


, SUM(as. [total_logical_writes]) AS [Total Logical Writes] 
, SUM(qs.[total_logical_reads]) AS [Total Logical Reads] 


FROM sys.dm_exec_query_ stats AS qs 


CROSS APPLY sys.dm_exec_plan_attributes(qs.Plan_handle) AS a 


WHERE a.[attribute] = ‘dbid’ 
GROUP BY 


ORDER BY [Total Worker Time (mSecs)] DESC 
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SUM(qs.[total_worker_time]) / 1008 AS [Total Worker Time (mSecs)] 


SUM(qs. [total_clr_time]) / 1000 AS [Total CLR Time (mSecs)] 
SUM(qs.[total_elapsed_time]) / 1000 AS [Total Elapsed Time (mSecs)] 


value], ISNULL(DB_NAME(CONVERTCINT,a.[value])), Resource’) 


applications’ use) of the consolidated database server. 
There are several methods that companies use to decide 
how much each application must pay compared with 
others. Some use a ratio of individual database size to 
the total size of all the databases. Some make the deci- 
sion based on the number of users for each application. 
But one of the most requested methods is to make the 
decision based on how much database-server CPU each 
application uses in relation to the other applications. In 
the past, this metric wasn’t always easy to determine and 
typically involved running a 24 x 7 trace to permit data 
analysis to obtain the ratio on a per-application basis 
from the calls to each database. 

How’s the performance? The second and most 
important question is how each application affects 
overall server performance. For example, suppose you 
have 10 applications and each one has one database. 
How do you determine which application is using the 
most resources on the server? That question is only part 
of the story. Just because an application is using lots 
of resources doesn’t tell you whether that application 
really needs to use so many resources for proper opera- 
tion. However, because all 10 databases are on the same 
SQL Server instance, you'll probably find it difficult to 
isolate resource usage and performance problems on a 
database-by-database level. 


A Better Solution 

You can see why many people believe that they can easily 
identify the aforementioned metrics by isolating each 
database in its own SQL Server instance. 
Without going into all the pros and cons 
of using multiple SQL Server instances, 
there’s a better option available to you, and 
that is to use SQL Server 2005’s built-in 
DMVs. In my article “Are Your SQL Server 
Statements Performing Well?” (InstantDoc 
ID _97761), I discussed how to use the sys 
.dm_exec_query_stats DMV to gather per- 
formance-related data on individual query 
plans. The one detail missing from that 
DMV is the database ID (DBID) in which 
each query is run. Without the DBID, it 
would be impossible to track the metrics on 
a database-by-database level (hence, applica- 
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tion level) without resorting to a much more involved 
trace or some other complex process. 

You can, however, use another DMV—called 
sys.dm_exec_plan_attributes—to get the DBID. This 


Certainly, there are some 
situations in which you should 
use multiple SQL Server 
instances, but you don’t 
typically see performance 
monitoring among those 
situations. Suffice it to say, 
there’s a much easier and 
cleaner way to get the kind of 
metrics that multiple SQL Server 
instances might provide. 


DMV lets you access indi- 
vidual attributes (e.g., the 
DBID, date format, user 
ID) for each query plan 
by simply cross-applying 
the DMV and specifying 
the other DMV’s plan_ 
handle identifier. 

Now, with the relatively 
simple query that Listing 
1, page 41, shows, you 
can identify and, more 
important, aggregate the 
information from the 
query_stats DMV by 
each database. Note that 
the DBID for ad hoc or 


prepared plans is the DBID from which the batch is exe- 
cuted. If you're connected to Northwind and run a query 
against pubs, the stats would show up under Northwind. 


Fully customizable, 


rule-based, Database 
Standards enforcement 


ApexSQL Enforce 


Database reviews 


Rapid, repeatable and impartial 


For example, 
use Northwind; select * from pubs.dbo.sales 


Listing 1's query represents only a subset of the 
information available through the DMV, but it answers 
the two aforementioned questions. Running the query 
gives you a quick way to see the total resources used by 
all the queries currently in the plan cache at a database- 
by-database level. You'll probably be most interested 
in the Total Worker Time column because it displays 
the amount of CPU used to process all the queries for 
a given database. You can then easily determine how 
much server CPU resources were used for each applica- 
tion in relation to one another and the total. 


Effective and Free 
This DMV method isn’t a foolproof way to track 
application usage or performance, but it’s simple, effec- 
tive, and—best of all—free. If you're concerned that 
one or more databases might be loading down your 
server, use this approach to set your mind at ease. But 
keep in mind my March warnings about plan cache 
volatility. SQL] 
InstantDoc ID 98760. 
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V Produce detailed reports of violations and results in HTML 
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How to use the GROUP BY clause 


in SELECT statements 


ou can begin to tap into the true power of 
Yr by using the GROUP BY clause in 

SELECT statements. Grouping data lets 
you produce reports that answer complex questions 
instead of reports that answer only basic questions. For 
example, with the GROUP BY clause, you can produce 
a report that answers the question “What is the average 
size of the bonus paid to each employee over the past 
10 years?” instead of “What is the average size of the 
bonus paid out to employees?” 

Depending on the level of detail you need in your 
reports, you can use the GROUP BY clause to group 
data by values in one or more columns. You can even 
use the HAVING clause to further refine your reports. 
Before I show you how to do so, though, you might 
want to create and populate a couple of tables so you 
can follow along. 


The Prerequisites 

To help demonstrate grouping, I created two tables: 

Bonus and MovieReview. The Bonus table contains the 

bonus payments given to eight employees in the past 10 

years. This table contains three columns: EmployeeID, 

Amount, and PaymentDate. The MovieReview table 

contains the ratings that the five employees have given 

to movies they’ve watched in their spare time. This 
table contains four columns: EmployeeID, Genre, 

MovieName, and Stars. The Stars field specifies the 

movie's rating, where | star is the worst rating and 5 

is the best rating. 

You can create and populate these tables by fol- 
lowing these steps: 

1. Download the CodeToCreateBonusTable.sql, 
CodeToPopulateBonusTable.sq], CodeTo 
CreateMovieReviewTable.sql, and CodeTo 
PopulateMovieReviewTable.sq] files. Go to 
www.sglmag.com, enter 98711 in the InstantDoc 
ID text box, click Go, then click the 9871 1.zip 
hotlink. 
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2. Create the Bonus table. Open SQL Server 2005’s 
SQL Server Management Studio (SSMS) or SQL 
Server 2000’s Query Analyzer and copy the code 
in CodeToCreateBonusTable.sgl into the query 
window. In the first line of code, change MyDB to 
the name of your database. Execute the code. 

3. Populate the Bonus table by running CodeToPopu- 
lateBonusTable.sql in SSMS or Query Analyzer. 

4. Create the MovieReview table. Copy the code 
in CodeToCreateMovieTable.sql into SSMS’s or 
Query Analyzer’s query window. In the first line of 
code, change MyDB to the name of your database. 
Execute the code. 

5. Populate the MovieReview table by run- 
ning CodeToPopulateMovieReviewTable 
.sql in SSMS or Query Analyzer. 


Grouping Data Using One 

Column 

I remember spending hours writing COBOL programs 

to produce reports. They took even longer to debug. 

Today, it takes me only a few minutes to crank out 

the equivalent T-SQL code to produce similar reports, 

thanks in part to the GROUP BY clause. 
When you use the GROUP BY clause in a SELECT 
statement, two things happen: 

1. GROUP BY uses the grouping criterion you 
specify to group the data being returned by the 
SELECT query. Typically, the grouping criterion 
is a column, in which case GROUP BY groups 
the data into the 
possible values 
in that column. 
For example, 
the EmployeeID 
column in the 
Bonus table has 
eight possible 
values (1 through 
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8), so GROUP BY would group the 
data being returned by the SELECT 


T 142500.00 query into those eight groups. 

2 50000.00 Similarly, the Stars column in the 

E e : as MovieReview table has five possible 

5 17000.00 values (1 through 5), so GROUP BY 
6 2000.00 would group the data into those five 
Sa » ral groups. GROUP BY returns only 


Figure | 


The total amount of 
bonuses paid to each 
employee 


Emp loyeeID 
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Figure 2 


The average bonus and 
the number of bonuses 
paid to each employee 


Figure 3 


Comparison of each 
employee’s average 
bonus with the overall 
corporate average 
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one row for each possible group. 

For example, GROUP BY would 

return eight rows when you group by 
the EmployeeID column and five rows when you 
group by the Stars column, assuming there is data 
in each group. 

2. After GROUP BY is done grouping the data, the 
aggregate function specified in the SELECT query 
is performed on each group rather than on the 
entire result set. So, for example, if the SELECT 
query specifies to use the AVG function to get the 
average of the values in the Amount column in 
the Bonus table and you grouped the data by the 
EmployeeID column, individual averages will be 
calculated for each of the eight groups instead of 
one overall average. 


Average Bonus Bonuses Paid 
14250.00 

5000.00 

4142.8571 

5000.00 

8500.00 

1000.00 

8000.00 

15000.00 


These concepts can be confusing for people new 
to T-SQL, so let’s take a look at a few examples. Note 
that in trying to keep these examples as straight- 
forward as possible, Im minimizing the use of the 
T-SQL concepts that I havent yet covered. People 
who are more familiar with T-SQL might question the 
column definitions I have chosen or the lack of dif- 
ferential integrity in the tables. Those concepts, while 
important, aren’t relevant to what I’m covering here. 
T'Il explore those topics in future lessons. Until then, 
please bear with me. 


EmployeeID Average Bonus Corporate Average 


14250.00 
5000.00 
4142.8571 
5000.00 
8500.00 
1000. 00 
8000.00 
15000.00 


I 
2 
3 
4 
5 
6 
7 
8 


Suppose you need to query the Bonus table to 
determine the total bonus amount paid to each 
employee over the past 10 years. In Lesson 3, I showed 
you how to use the SUM function in a SELECT state- 
ment to obtain the total of all the values in a specified 
column. If you were to use this function with the 
Amount column in the Bonus table, you'd get a single 
dollar figure representing the total amount that the 
company paid out in bonuses for the past 10 years. To 
get the total bonus amount paid to each employee over 
the past 10 years, you can use a GROUP BY clause in 
which the specified column is EmployeeID: 


SELECT EmployeeID, 

SUM(Amount) AS 'Total Bonus' 
FROM Bonus 
GROUP BY EmployeeID 


As you can see in the results in Figure 1, employee 6 has 
the lowest total bonus payout. If you cross reference 
this result with the data in the Employee table created 
in Lesson 3, you'll see that employee 6 is Napoleon 
Lawrence, one of several employees most recently hired 
in 2006. You'll also see that Napoleon has the lowest 
salary of all the employees. Thus, in all likelihood, his 
total bonus payout is less than the other employees 
because he hasn’t been with the company that long 
and his salary is low. (Typically, employees with lower 
salaries receive lower bonuses.) 

Now let’s go a bit further and determine the average 
bonus per employee along with the number of bonuses 
paid to each employee. You can do this by using the 
AVG and COUNT functions and grouping their 
results by EmployeeID: 


SELECT EmployeeID, 
AVG(Amount) AS ‘Average Bonus’, 
COUNT (Amount) AS ‘Bonuses Paid' 
FROM Bonus 
GROUP BY EmployeeID 


As the results in Figure 2 show, Napoleon (employee 
6) has indeed earned only two bonuses and his average 
payout is $1,000, which is significantly lower than all 
the other employees. 

Now let’s compare each employee’s average bonus 
with the overall average paid out by the company. 
You first need to determine the overall average of the 


% Higher/Lower than Corp Avg 


„98555 377200 
„7945425 3600 
.80119293700 
„7945425 3600 
„14927768800 
„15890850700 
„7287 3194200 
„61637239100 
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bonuses paid out by the company: 


SELECT AVG(Amount) 
AS 'Corporate Average! 
FROM Bonus 


The result is $7787.50. Next, you need to plug 
this corporate average into a calculation that 
determines how each employee's average bonus 
compares with the corporate average in terms 
of percentage: 


SELECT EmployeeID, 
AVG(Amount) AS ‘Average Bonus’, 
7787.58 AS ‘Corporate Average', 
(AVG(Amount) - (7787.50)) / (7787.50) 
* 108 
AS '% Higher/Lower than Corp Avg' 
FROM Bonus 
GROUP BY EmployeeID 


Figure 3 shows the results. Note that if you wanted 
to calculate the corporate average on the fly, you 
could use several subqueries in the SELECT 
statement. Or even better, you could calculate the 
corporate average and store the result in a local 
variable. You could then reference this variable in 
the SELECT statement instead of hard-coding the 
corporate average. However, this is an advanced 
topic that I'll cover in a later lesson. 


Grouping Data by More Than 
One Column 

When selecting data for a query, you can group by 
more than one column. When you group by more 
than one column, you should include the ORDER 
BY clause to specify how you want the returned data 
sorted. If you don’t use this clause, the data isn’t 
returned any particular order. 

For the next examples, let’s switch to the Movie- 
Review table to see what some employees have been 
up to in their (ahem) spare time. Suppose you want to 
find out each employee’s favorite types of movies (Le., 
genres) and the average rating he or she has given them. 
To determine the favorite genres, you need to count the 
number of times each type of movie was reviewed, which 
you can do with the COUNT function. You also need to 
use the AVG function to determine the average rating for 
each genre. To show the number of movie reviews per 
employee per genre, you need to group this data by two 
columns: EmployeeID and Genre. To make it easy to see 
each employee’s favorite types of movies, you can sort the 
data first by the returned values in EmployeeID column 
(column 1) then by the returned values in the Review 
column (column 3). Here’s what that query looks like: 


SELECT EmployeeID, Genre, 
COUNT(*) AS 'Reviews', 
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EmployeeID Genre 


al 
al 
al 
al 
ail 
1 
2 
2 
2 
2 
2 
2 
3 
3 
3 
3 
3 
3 
4 
4 
4 
4 
4 
5 
5 
5 
5 


Fiction 
Sci-Fi 
Documentary 
Drama 
Horror 
Comedy 
Fiction 
Sci-Fi 
Documentary 
Horror 
Drama 
Comedy 
Fiction 
Sci-Fi 
Documentary 
Drama 
Horror 
Horror 
Drama 
Documentary 
Comedy 
Sci-Fi 
Comedy 
Documentary 
Drama 
Horror 
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AVG(Stars) AS 'Average Rating' 
FROM MovieReview 
GROUP BY EmployeeID, Genre 
ORDER BY 1,3 


As you can see from the results in Figure 4, 
employee | has watched many different types of movies 
and seemed to enjoy the fiction and horror movies the 
most because they had the highest average rating. You 
can also see that employees 4 and 5 haven’t reviewed as 
many movies as the other employees. 


Refining Results with the 
HAVING Clause 

The HAVING clause is used to eliminate rows from 
the result set after the data has been aggregated and 
grouped. Any column defined in the SELECT list can 
be referenced in the HAVING clause. You can also 
reference aggregate functions. 

For example, the results in Figure 4 show that not 
all the employees have reviewed the same number of 
movies. So, let’s produce a report that lists the average 
rating of every movie that has been rated by four or 
more employees. 

As output, you want to see each movie’s name, the 
average number of stars it received, and the number of 
times it was reviewed. To get this output, you need to: 
e Use the GROUP BY clause to group the data by 

the MovieName column. 
e Use the AVG function to determine the average 
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Figure 4 


Each employee's 
favorite types of 
movies and the 


average rating he or she 


has given them 
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Figure 5 


The average ratings for 
movies with four or 
more reviews 


Figure 6 


Movies with the worst 
ratings (an average 
rating of three or 
fewer stars) 
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MovieName 


Meet the Clusters 
Planet Of The Ape-Like DBAs 


v for Vendor 

When Harry Re-Indexed Sally's Table 
Bits, Bytes, Videotape 

Chariots of Firewire 


number of stars for each movie. However, instead of 
displaying the average number of stars as an integer 
(boring), you can use the REPLICATE function 

to display an asterisk (*) for every star the movie 
received. The REPLICATE function repeats a 
string value the specified number of times. To make 
sure this new Stars column is only 10 characters 
long, you need to use the LEFT function, which 
returns with the specified number of characters 
from the left part of the specified string. 


MovieName 


Defragger Hill 

Meet the Clusters 

Planet Of The Ape-Like DBAs 
The User who Knew Too Much 
v for Vendor 


When Harry Re-Indexed Sally's Table 


Bridge Over The River Motherboard 


e Use the COUNT function to determine how many 
reviews each movie received. 

e Use the HAVING clause to display only those 
movies that were rated by at least four employees. 

e Use the ORDER BY clause to sort the movies by 
their average rating. 


The query would look like 


SELECT MovieName, 
LEFTCREPLICATE('* ',AVG(Stars)) ,1@) 
AS 'Stars', 
COUNT(*) AS 'Reviews' 
FROM MovieReview 
GROUP BY MovieName 
HAVING COUNT(*) >= 4 
ORDER BY Stars 


Figure 5 shows the results. 

Now let’s get a list of the movies with the worst 
ratings—that is, those movies with an average rating 
of 3 or fewer stars. You can use the same query, except 


MovieName 


Chariots of Firewire 


you need to use a HAVING clause that will display 
only those movies that received an average of three or 
fewer stars: 


SELECT MovieName, 
LEFTC(REPLICATE('* ',AVG(Stars)) ,10) 
AS 'Stars' 
FROM MovieReview 
GROUP BY MovieName 
HAVING AVG(Stars) <= 3 
ORDER BY Stars 


As Figure 6 shows, there are 
some real stinkers. 

Finally, let’s see what movies 
the employees enjoyed the most 
by changing the HAVING 
clause so that it displays only 
those movies that had an average 
rating of four stars or higher: 


* 
* 
x 
* 
* 
* 
* 


SELECT MovieName, 
LEFT (REPLICATE('* ',AVG(Stars)),19) AS 
"Stars! 

FROM MovieReview 

GROUP BY MovieName 

HAVING AVG(Stars) >= 4 

ORDER BY Stars DESC 


Notice that the ORDER BY clause sorts the data by 
the number of stars, but this time it displays them in 
descending order. Figure 7 shows the result. 


A Powerful Feature 
As you witnessed in this lesson, the GROUP BY 
clause is a powerful feature of T-SQL. Grouping 
data might have been a foreign concept to you prior 
to this lesson, but now you should be well on your 
way to writing complex useful reports in record time. 
Be sure to rub it into any COBOL programmers you 
might know. SOL 
InstantDoc ID 98711 
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Movies with the best ratings (an average rating of four or more stars) 
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as a key component in SAN and NAS 
deployments. He's also a regular contributor to 
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is active on Exchange newsgroups and forums. 
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Mailbox High Availability in Exchange 2007: Learn the Pros and Cons 
of Your High-Availability Options 

Exchange 2007 now has several acronyms for high availability—LCR, 
CCR, SCR—not including anything you can do with your storage or CDP 
solutions. Which method is best for you? How can you implement a mix 
of options to make your environment highly available at a price point 
that doesn’t break the bank? 


Transport Rules—Real-World Examples That Can Help Your 
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What are transport rules and how do they help you administer your 
Exchange environment? You can find many complicated—and largely 
useless—examples on the Internet. We'll show some interesting things 
you can do with message flow and give you real-world examples. 
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Surgery or Rocket Science 

Some Exchange admins resist PowerShell because they think they can 
complete tasks quicker through the GUI. But we'll present some useful, 
quick, and readily repeatable PowerShell commands that will make your 
job easier rather than your hair grayer. 
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Solid State Storage 
for SQL Server 


SSDs set to take the market by storm 


istorically, the most widely used types of storage 

have been DAS, NAS, SANs, and—more 
recently—iSCSI SANs. Each type has its niche, including 
associated advantages and disadvantages. But a new 
trend in the storage market threatens to blow them all 
out of the water: solid state disk (SSD). For a summary 
of other SQL Server storage options, see the Web sidebar 
“Other SQL Server Storage Options” (www.sqlmag 
.com), InstantDoc ID 98712. 

SSDs are the latest advancement in storage tech- 
nology. The aerospace and military industries have used 
SSD technology since the mid-1990s, when M-Systems 
introduced flash-based SSDs. But recent price drops 
have brought SSDs into the enterprise IT market. SSDs 
are also gaining popularity for use in ultraportable 
notebook computers. SSDs have standard Serial ATA 
(SATA) connections and use solid-state memory to 
store data. In addition to flash memory, SSDs can use 
static RAM (SRAM) or DRAM. Because SSDs mimic 
hard drives, they can easily replace them. 


Advantages 

SSDs provide numerous advantages over previous 
storage options. Because no rotation is involved (i.e., 
no moving parts), data access is faster than with DAS, 
NAS, or SANs—SSD access times range from 10 to 15 
microseconds, which is 250 times faster than hard disk 
drives. A lack of moving parts also means increased 
reliability. In addition, SSDs use less power than 
other storage options. BITMICRO Networks’ Product 
Officer Ces Martorillas notes that “from a cost-benefit 
standpoint, SSDs can replace approximately 200 
HDDs ... not to mention the savings on power and 
cooling energy required per device.” SanDisk, a leading 
SSD storage vendor, asserts that “SSD is rugged, fast, 


LEARNING PATH 
SQL SERVER MAGAZINE RESOURCES 


Learn about other storage options: 

“iSCSI SANs for SMBs,” InstantDoc ID 97607 
“Network Attached Storage,” InstantDoc ID 92718 
“SQL Server on a SAN,” InstantDoc ID 48486 


“What's the Best Way to Carve Up a SAN?” Instant- 
Doc ID 96555 


SQL Server Magazine * www.sqlmag.com 


and power efficient. It’s just what you need ... to drive 
your business more successfully.” One of the most cru- 
cial SQL Server database performance factors is I/O; 
SSDs’ faster data access times give them an important 
advantage over other storage mechanisms. 


Disadvantages 
Traditionally, SSDs have had a fairly high cost per I/O 
operation—although recent advances in flash tech- 
nology are lowering those costs. EMC spokesperson 
Colin Boroski notes that “over the next few years we 
expect flash drive prices will decline at a faster rate 
than traditional Fibre Channel drives due to the rapid 
advances being made in semiconductor manufacturing 
technologies and the natural effects of increased vol- 
umes in the market.” 

In the past, SSD 
storage capacity was lim- 
ited. One workaround 
for this problem is to 
store only your most fre- 
quently accessed data (e.g., 
tables, database compo- 
nents) on SSD. However, 
BiTMICRO Networks, 
who already offers 256GB 
SSDs, has announced plans to release SSDs with up to 
832GB capacity in third quarter 2008. And the 
company claims to have packed a whopping 
1.6TB into their E-Disk Altima E3S320 SSD. 


Lavon Peters 


peters @ windowsitpro.com) is a senior 
editor for Windows IT Pro and SQL Server 


Magazine, specializing in storage, backup and 
recovery, and mobile/wireless solutions. She 


has worked as a technical editor since 1994. 


From a cost-benefit 
standpoint, SSDs can replace 
approximately 200 HDDs... 
not to mention the savings 
on power and cooling energy 
required per device. 


ORE on the WEB 


See the Web-exclusive sidebar 
at InstantDoc ID 98712. 


When To Use SSDs 

According to Texas Memory Systems, the SQL 

Server environments that can benefit the most 

from SSDs are those in which the servers have 

long I/O wait times. In the white paper “Faster 

SQL Server Access with Solid State Disks” (www_ 

.texmemsys.com/files/f000174.pdf), the company rec- 

ommends that you investigate the following SQL 

Server database components to determine the cause 

of increased I/O: 

* The entire database—Look for excessively large 
databases, concurrent access by many users, or users 
frequently accessing all the tables in the database. 

e Transaction logs—Look for a high number of 
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Virtualization, as you might 


SOLID STATE STORAGE FOR SQL SERVER 


entries, which occur during write transactions and 
increase a database's I/O time. 

* The temporary database—This database stores 
temporary data during many types of operations; 
complex operations can complete more quickly if 
the temporary database is on SSD. 

e Indexes—SQL Server updates table indexes every 
time it adds or modifies a record, and it accesses 
these indexes during each read transaction, which 
results in frequent, small, and random transac- 
tions—and thus increased I/O time. 

e Frequently accessed tables—As with indexes, when 
users access tables frequently, random data requests 
occur. Random requests translate directly into 
higher I/O wait times. 


Moving to SSD can alleviate all these causes of I/O 
delay and improve SQL Server performance. 


Other Considerations 

Several additional factors can come into play in the 
decision to move from traditional storage to SSD 
in a SQL Server environment. The size of an orga- 
nization and the extent of its SQL Server use are 
important, if only indirectly. For example, small com- 
panies might not have the 
budget to implement 
SSDs—traditionally, 


expect, is a factor you'll want to  ssD prices have been five 
consider before implementing times that of standard 
SSDs in your SQL Server SATA drives. In addition, 
environment.The performance Mark Hayashida, CTO 


gains ... are magnified in a virtual- 
ized SQL Server environment. 
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of Solid Data Systems, 
notes that “typically, 
larger companies tend to 
utilize a greater number 
of enterprise-class applications that benefit greatly by 
deploying SSDs within their storage pool.” 

The most important thing to consider is the par- 
ticular SQL Server applications that your organization 
employs. Eric Schott, the senior director of product 
management for Dell EqualLogic Storage, says the 
decision to switch to SSD is “a function of the per- 
formance requirements of the SQL application (data- 
base), and the importance of the application to the 
business.” The type of database (i.e., OLAP or OLTP) 
isn’t a factor—whether SSD will benefit a SQL Server 
organization depends on the types of transactions 
within that database. 

Virtualization, as you might expect, is a factor 
you'll want to consider before implementing SSDs in 
your SQL Server environment. The performance gains 
achieved by implementing SSDs are magnified in a 
virtualized SQL Server environment. In “The 
High Performance SAN Alliance: SAN, SSD, 
and Virtualization” (www.chicorporation 


.com/pdf/SAN/The%20High%20Performance”%20 
SAN%20Alliance%20SAN_SSD_%20and%20 
Virtualization.pdf), Texas Memory Systems notes that 
“a superior storage virtualization solution simplifies 
storage provisioning and reduces administrative over- 
head. It also enables and simplifies the targeted provi- 
sioning of resources, so that the fastest storage (e.g., 
SSD) can be provisioned to those applications that 
need it, when they need it, for maximum performance.” 
Eric Schott, of Dell, echoes, “Storage virtualization 
helps make it easier for the administrator to place 
appropriate data on the faster storage devices—storage 
virtualization lends to easier management.” 

And as BITMICRO Networks’ Martorillas says, 
“A superior storage virtualization solution simplifies 
the complexity in handling and managing storage 
needs for every application. With virtualized storage, 
applications requiring high levels of performance can 
be provisioned with the fastest storage, like SSD, in 
their storage pool. Virtualized SSDs, like any other 
virtualized storage, can be allocated and de-allocated 
depending on customer needs. Applications may only 
require a high level of performance for a certain period 
of time, so SSDs allocated in these applications may be 
re-assigned to other applications when needed.” Such 
flexibility is useful in a SQL Server environment in 
which applications periodically generate high volumes 
of transactions. 


An Emerging Alternative 
SSDs are no longer limited to government or niche 
markets. They're widely available from several vendors 
for use in enterprise applications. SQL Server envi- 
ronments can especially benefit from SSDs, because 
SQL Server is such an I/O-intensive application. As 
the price of these devices continues to drop, and their 
storage capacity continues to increase, they present an 
affordable, high-performance alternative to traditional 
storage options. SOL 
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SteelEye LifeKeeper 
Protection Suite 
for SQL Server 


ifeKeeper Protection Suite for SQL Server from 

SteelEye Technology protects SQL Server data 
and ensures disaster recovery and high availability by 
letting you create failover clusters comprising local 
and remote standby servers. I tested LifeKeeper Pro- 
tection Suite for SQL Server (LPS-SQL) 6.1.2, which 
combines SteelEye Data Replication (SDR) volume 
replication and the LifeKeeper for Windows high avail- 
ability feature set, with added support for SQL Server. 
Installation of the two products is integrated with a 
single setup routine, although the two products install 
as separate services, with separate documentation and 
separate management interfaces. 

Key features include block-oriented synchronous 
and asynchronous volume replication, automatic and 
manual failover modes supporting shared or repli- 
cated storage on both physical and virtual servers, and 


LIFEKEEPER PROTECTION 
SUITE FOR SQL SERVER 


Pros: Broad feature set supports complex failover 
scenario that includes multiple remote and local 
servers as well as both shared and replicated stor- 
age; flexible local and remote administration; easy- 
to-execute manual failover and fail-back processes; 
reliable automatic failover 


Cons: Rewind feature displays only some of the 
available rewind points and isn’t SQL Server trans- 
action aware; documentation for components isn’t 
integrated and lacks depth 


Rating: edr KI YY 
Price: $3,280 per server, $820 annual support fee 


Recommendation: If you can live with weak data 
rewind features, | recommend that you add LPS- 
SQL to your short list when looking for a flexible, 
easy-to-implement high availability solution. 


Contact: SteelEye Technology e 866-318-0108 e 
www.steeleye.com 
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continuous data protection (CDP) within the recovery 
feature set. LPS-SQL supports x86 and x64 (but not 
TA64) versions of Windows Server 2003 and Windows 
2000 Server and protects SQL Server 2005 and SQL 
Server 2000. 

When creating the protected resource on the pri- 
mary server, LPS-SQL saw that the databases were 
on the system volume. Since system vol- 
umes cant be part of a failover hierarchy, 
the wizard relocated them to a volume I 
designated. This is a great new feature in 
LPS-SQL. Without LPS-SQL’s ability to 
automatically migrate SQL Server's system databases 
to a volume eligible to be a protected resource, you'd 
have to do this manually, which isn’t a trivial task. 

To protect multiple instances of SQL Server, you 
define a hierarchy for each instance and select the 
databases to be protected. Each server within a cluster 
can be the active server for one or more SQL Server 
instances and be a standby server for other instances. 
Although a cluster might include servers local to or 
remote from the primary application server, config- 
uring failover to a remote SQL Server server cur- 
rently involves a minor workaround and the use of a 
protected DNS resource. SteelEye plans full support 
for remote configuration in the next release. 

Because LPS-SQL monitors and logs I/O activity 
at the level of a volume rather than a database, it isn’t 
aware of SQL Server transactions, and its CDP feature, 
Rewind, doesn’t use transaction logs when rewinding 
data. If you're looking for very granular data-rewind 
points, perhaps this isn’t the product for you. 

Overall, I liked LPS-SQL’s ease of configuration. 
The documentation is well organized, though it lacks 
depth and doesn’t integrate the data replication and 
high availability components well. The product’s sup- 
port for failover to remote systems and its ability to 
incorporate a variety of hardware configurations into 
the same cluster make it very flexible. If you don’t mind 
a lack of granularity in the data-rewind feature, I sug- 
gest you bring LPS-SQL in for a test. SOL! 
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HP ProLiant DLI60 


H ProLiant DL160 is the first of the new-gen- 

eration high-performance 1U servers. The 
DL160 is a two-socket system that puts dual quad-core 
power into a tiny 1U form factor. It supports a total of 
32GB of RAM and up to 3TB of storage. Unlike other 
units in the ProLiant line, the DL160 doesn’t have HP’s 
Insight Manager, which would have provided system 
monitoring, alerting, and asset management. I ran the 
system using both the preinstalled Windows Server 
2003 Enterprise x64 Edition and Windows Server 2008 
Enterprise x64 Edition, which I installed. 

I found the unit reasonably light and easy to install 
into the rack. It’s rather loud, especially when powering 
on, as the fans all shift into high gear. It quiets down 
after it’s running. Overall performance was exceptional. 
It was a fast file server, and its dual quad-core proces- 
sors really made it shine as a Web application server. 
However, I was very surprised to find that although the 
system shipped with virtualization-capable processors, 
virtualization support was removed from the system’s 
BIOS. This deficiency rendered the system incapable 
of running Server 2008 Hyper-V even though Server 
2008 is one of the supported OSs. The DL160 would 
be a great Web server, database server, or general file 


Audit Data changes, 
Recover without a backup 


the Ultimate Log Reading, Auditing and 
Recovery tool for SQL Server 


server for a small-to-midsized business (SMB). But if 
you need a small-scale virtualization server for Server 
2008, you should look elsewhere. SOL 
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HP PROLIANT DLI60 


Pros: 1U form factor; high performance; 64-bit 
compatibility 


Cons: No support for Hyper-V virtualization; no 
optical drive test unit 


Rating: Kw KIT 


Price: Starts at $1,379; tested configuration, 
$6,970. Base price includes one quad-core Xeon 
2GHz processor, 1GB of RAM, and one 80GB 
7,200rpm hard drive. Tested configuration included 
two quad-core Xeon processors, 8GB of RAM, and 
four 146GB 15,000rpm drives, but no optical drive. 


Recommendation: Choose the DL160 if you 
need a Web server or a general-purpose file server 
for an SMB, but realize that you can’t use it with 
Windows Server 2008 Hyper-V virtualization. 


Contact: HP e www.hp.com * 800-752-0900 
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Bytes from the Blog 
www.sqlmag.com/go/industrybytes 


Sun Acquires MySQL AB: 


Now What? 


A you may have heard, Sun Microsystems 
Si ponied up a cool ȘI billion for 


open-source database developer MySQL AB. MySQL 
has been an integral aspect of many Web develop- 
ment products, and has emerged as the world's most 
popular open-source database software. MySQL AB 
claims that more than 100 million copies of MySQL 
have been downloaded over the years, and the MySQL 
developer community is large and vocal. 

Some of the success of MySQL can be partly 
attributed to the past few years’ rapid growth in 
database-driven Web applications, many of which rely 
on MySQL and PHP. MySQL has also been widely 
adopted by many large enterprises for internal use; the 
MySQL Web site lists dozens of corporate giants that 
use their software, from Airbus/EADS to Apple and 
Sears to Citrix. MySQL has also been available for a 
huge variety of platforms, from Mac OS X to multiple 
variants of Windows and a plethora of UNIX and 
Linux distributions. MySQL is currently offered in two 
editions, which both share the same code base: Com- 
munity Server and Enterprise Server. The impressive 
amount of adoption that MySQL enjoys is undoubt- 
edly one of the reasons why Sun came knocking. 

In a recent statement, Zack Urlocker, MySQL AB’s 
executive vice president of products, reiterated MySQL’s 
status as the ubiquitous database of choice for many 
online applications. “We are proud that MySQL is 
established as the ‘de facto’ database for modern online 
applications—for both large and small Web proper- 
ties,” Urlocker said. “While many of the world’s largest 
Internet-based companies rely upon MySQL, our data- 
base is also helping innumerable start-ups handle their 
explosive growth to reach their business goals.” 

It’s clear that MySQL is a widely-used and popular 
open-source database, but what does Sun gain from their 
$1 billion acquisition? Given that Sun has only been an 
interested onlooker in such markets as enterprise data- 
bases, data warehousing, and business intelligence (BI), 
the MySQL acquisition could do wonders for Sun in 
those markets: MySQL is a contributing technology in 
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all of those spaces, and MySQL immediately gives Sun a 
seat at the table in these emerging, high-growth markets. 

Sun’s Java is also a perfect companion to MySQL, 
given that both technologies are used extensively in 
next-generation Web applications. Sun also could use 
MySQL as a great value-add story for their bundling 
efforts: When talking to prospective customers about 
what they might like to choose from the Sun Microsys- 
tems product menu, MySQL opens up a new page of 
entrees. Sun Microsystems CEO and president Jonathan 
Schwartz seems to think so, and said as much when the 
deal was announced. “[The MySQL] acquisition reaffirms 
Sun’s position at the center of the global Web economy. 
Supporting our overall growth plan, acquiring MySQL 
amplifies our investments in the technologies demanded 
by those driving extreme growth and efficiency, from 
Internet media titans to the world’s largest traditional 
enterprises,” said Schwartz. “MySQLs employees and 
culture, along with its near ubiquity across the Web, 
make it an ideal fit with Sun’s open approach to network 
innovation. And most importantly, this announcement 
boosts our investments into the communities at the heart 
of innovation on the Internet and of enterprises that rely 
on technology as a competitive weapon.” 

The idea of large companies buying open-source 
software developers certainly isn’t new: Citrix gobbled up 
XenSource a few months prior to the Sun/MySQL deal, 
and the open-source developer grab is likely to continue. 
On the downside, integrating any new product acquisi- 
tion into a large enterprise like Sun can always be a chal- 
lenge. And then there’s the question of the large, vocal, 
and passionate MySQL user community. Will they stay 
as committed to MySQL given that the product is now 
joined at the hip with a large, for-profit corporation? 

Only time will answer these key questions, but Sun 
and MySQL have definitely stirred up the pot and 
disrupted the status quo in the enterprise relational 
database management system (RDBMS) market. And 
from where I stand, roiling the waters a bit seems like 
a pretty good thing. SQL 
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Fujitsu Computer Systems Corporation has released NeoData, an application that facilitates the migration of data 
from legacy mainframe applications to SQL Server. According to Fujitsu, NeoData allows virtual storage access 
method (VSAM) data systems to be transferred to SQL Server, thanks in part to a data mapping tool that trouble- 
shoots VSAM to SQL migrations. Fujitsu claims that NeoData can resolve common migration issues, ranging from 
COBOL OCCURS clause issues and problems with multiple record formats. A management GUI streamlines the 
VSAM to SQL Server migration process, which Fujitsu says can help minimize the need to rewrite old COBOL code 
and helps get DBAs using legacy data in a SQL Server environment more quickly. For more information, contact 
Fujitsu Computer Systems Corporation at_cobol@netcobol.com or visit www.netcobol.com. 


TRAINING 

SQL Server 2005 Instructional DVD 

SQL Server Magazine Contributing Editor Kalen Delaney has announced the 
immediate availability of the first lesson of her SQL Server 2005 Internals, Architec- 
ture, and Tuning: Lesson 1 on DVD. According to Delaney, this instructional DVD 
includes more than two hours of training material drawn from her advanced SQL 
Server courses. Lesson 1 includes demonstrations of SQL Server features such as 
SQL Server 2005 Architecture, Dynamic Management Views and Functions, and 


SQL Server metadata. The Lesson 1 DVD costs $29 plus shipping, and discounts are Pate Tanabi 
available for bulk orders. For more information, visit www.SQLServerDVD.com. adin ee 


AUDITING AND COMPLIANCE 

SQL Server Change Monitoring 

NetPro Computing, Inc. has announced ChangeAuditor 4.5, the latest version of the vendor’s auditing and compli- 

ance product that now features a new SQL Server auditing module. This new module supports SQL Server 2005 

and 2000, and introduces hundreds of new audit events for SQL database auditing. According to NetPro, this new 

release allows DBAs to configure security, auditing, and reporting features themselves, or they can rely on pre-defined 
reports and searches for the most common auditing and regulatory requirements. For more 
information, contact NetPro Computing at sales@netpro.com or visit www.netpro.com. 


PERFORMANCE MONITORING 

nnam SQL Server Optimization 

= Improving the performance of SQL Server databases is the focus of Confio Software’s 
newly released Confio Igniter Suite PI for SQL Server. This new product provides agent- 
less monitoring of SQL Server databases, and includes the Ignite Performance Ware- 
house, an architecture designed to facilitate the storage and analysis of performance data. 
Confio also touts the ability of the product to analyze and optimize end user wait states, 
which Confio says is one of the leading causes of slow database performance. For more 
information, contact Confio Software at info@confio.com or visit www.confio.com. 


SECURITY 


Enterprise Database Protection 
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Secerno has released Secerno.SQL 2.1, an update to their enterprise database security 
management product. Secerno.SQL 2.1 allows administrators to create and enforce security policies for shared 
databases across the enterprise. DBAs can set access restrictions by date, location, and access method, and Secerno 
claims that the product “also uniquely controls individuals’ behavior whilst accessing [the database]”, all without 
the need to create elaborate policy settings. A new Direct Console Access Monitoring (DCAM) feature introduces 
enhanced auditing functionality and improved control and monitoring of privileged database users. For more 
information, contact Secerno at enquiries@secerno.com or visit www.secerno.com. [SQL] 
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S Server Data Services isnt some 
Q new SQL Server 2008 feature that 
you might have glossed over. Nor is it an edition 
of SQL Server 2008 that you haven’t heard about. 
SQL Server Data Services is the database version 
of Microsoft’s new Software Plus Services strategy, 
which combines software with remote services via 
the Internet. Here are the top five things you need 
to know about SQL Server Data Services. 


What is SQL Server Data 
Services? 

Fulfilling Microsoft’s vision of becoming a Soft- 
ware Plus Services provider, SQL Server Data 
Services provides a services wrapper for data access 
that’s targeted toward businesses of all sizes. It’s 
an on-demand data storage and retrieval service 
offering data access anywhere via the Web. 


How much data and what 
kind of data can be stored? 

At press time, all of the details regarding the 
internal capabilities of SQL Server Data Services 
had yet to be announced. However, Microsoft has 
stated that SQL Server Data Services will support 
several simple data types including string, numeric, 
datetime, and Boolean. At this point, Microsoft 
has placed no limits on the amount of data that 
can be stored. 


“Microsoft 


s Softwar 


How is the data accessed? 

SQL Server Data Services supports XML standards- 
based Representational State Transfer (REST) and Simple 
Object Access Protocol (SOAP) interfaces. Microsoft 
will also provide both a Visual Basic (VB) and C# client 
library, enabling Language-Integrated Query (LINQ)- 
based data access. LINQ queries support full-text search 
and paging queries. 


Where is the data stored and how 
istit protected? 

SQL Server Data Services data is stored in Microsoft’s own 
SQL Server systems. Customers can store and manage 
multiple copies of data, and Microsoft provides geo- 
redundant data for disaster recovery and to protect against 
regional and site outages. All data transfers between the 
client and SQL Server Data Services are secured by Secure 
Sockets Layer (SSL). 


What is the price for SQL Server 
Data Services? 

SQL Server Data Services is still in beta, so the price hasn’t 
yet been determined. However, it would be safe to bet that 
Microsoft will charge for the service based on the amount 
of storage used. Find out more about SQL Server Data 
Services and register for the beta at www.microsoft.com/ 


sql/dataservices/default.mspx. SQL} 
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well-be-a-monopoly is up to. Get the background 
on, Microsoft's Software Plus Services, read how 
it’s manifesting, and learn what this new approach 
could mean for you. 


Will Software Plus Services add to your success or 
just to Microsoft’s wallet? Send me your thoughts at 


christan.humphries@penton.com. 
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